-
Notifications
You must be signed in to change notification settings - Fork 13
revert: Bitbucket OAuth 1.0 → 2.0 migration (PR #772) #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
d45d8bb
feat(bitbucket): Migrate to OAuth 2.0
sentry[bot] 63d11ff
fix(bitbucket): add OAuth 2.0 state param to prevent CSRF/account tak…
drazisil-codecov b879dc9
test(bitbucket): update VCR cassettes for OAuth 2.0 Bearer token auth
drazisil-codecov 1d8cf9f
test(bitbucket): add missing VCR cassette for test_list_repos_generator
drazisil-codecov 68e59cf
fix(bitbucket): add OAuth 2.0 token refresh on 401
drazisil-codecov 6e40505
fix(bitbucket): restore BITBUCKET_SERVER refresh guard; handle refres…
drazisil-codecov ef14c54
test(bitbucket): update tests for enabled token refresh callback
drazisil-codecov 308161b
fix(bitbucket): add secure flag to state cookie; remove unused param
drazisil-codecov 433f3a9
fix(bitbucket): add samesite and max_age to state cookie
drazisil-codecov 7c1c15f
refactor(bitbucket): replace while loop with explicit two-attempt retry
drazisil-codecov 711a3a9
revert: Bitbucket OAuth 1.0 → 2.0 migration (PR #772)
drazisil-codecov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # Owner deletion: chunked IN filter | ||
|
|
||
| ## Summary | ||
|
|
||
| When deleting an owner with tens of thousands of child rows (uploads, commits, repositories, etc.), the cleanup service was generating `DELETE ... WHERE id IN (...)` SQL statements with the full list of IDs embedded directly in the query text. For large owners this could mean hundreds of kilobytes of SQL in a single statement, which caused connection drops and broke the cleanup job mid-run. | ||
|
|
||
| **Fix:** [PR #731](https://github.com/codecov/umbrella/pull/731) splits those materialized ID lists into chunks (default: 10,000 IDs each) so each individual DELETE statement stays well within safe size limits. | ||
|
|
||
| --- | ||
|
|
||
| ## Background: how cleanup works | ||
|
|
||
| Deleting an `Owner` requires also deleting everything associated with it across many tables — repositories, commits, uploads, reports, etc. The cleanup service builds a dependency graph of all related models and issues DELETE statements in the correct order (children before parents). | ||
|
|
||
| To know *which rows* to delete in each child table, it needs to pass along the IDs from the parent query. There are two ways to do this: | ||
|
|
||
| **Subquery** — the DB resolves IDs at query time: | ||
| ```sql | ||
| DELETE FROM repositories WHERE owner_id IN (SELECT id FROM owners WHERE ...) | ||
| ``` | ||
|
|
||
| **Materialized list** — Python fetches the IDs first, then embeds them directly: | ||
| ```sql | ||
| DELETE FROM repositories WHERE owner_id IN (1, 2, 3, 42, ...) | ||
| ``` | ||
|
|
||
| The materialized approach (`simplified_lookup`) was added previously because it produces better query plans — the DB knows exactly how many rows it's dealing with instead of having to estimate. For typical owners this works well. | ||
|
|
||
| --- | ||
|
|
||
| ## The problem | ||
|
|
||
| For large owners, `simplified_lookup` can materialize tens of thousands of IDs. Those IDs all end up as a literal list inside the SQL string: | ||
|
|
||
| ```sql | ||
| DELETE FROM uploads WHERE id IN (1, 2, 3, 4, ... 50000 values ...) | ||
| ``` | ||
|
|
||
| A statement with 50,000 IDs can be ~250–450 KB of text (each integer ID plus separators). Statements that large caused the database connection to drop, leaving cleanup in a partially-completed state. | ||
|
|
||
| --- | ||
|
|
||
| ## The fix | ||
|
|
||
| PR #731 adds a chunking step (`_chunked_in_filter`) between materializing the ID list and building the DELETE querysets. | ||
|
|
||
| **Before:** one queryset (and therefore one DELETE statement) per field/filter combination, regardless of list size. | ||
|
|
||
| **After:** if the materialized list exceeds `delete_chunk_size`, it's split into multiple querysets — one per chunk: | ||
|
|
||
| ``` | ||
| -- chunk 1 | ||
| DELETE FROM uploads WHERE id IN (1, ..., 10000) | ||
| -- chunk 2 | ||
| DELETE FROM uploads WHERE id IN (10001, ..., 20000) | ||
| -- etc. | ||
| ``` | ||
|
|
||
| The chunk size defaults to 10,000 and is configurable via `cleanup.delete_chunk_size` in `codecov.yml` without requiring a code change. | ||
|
|
||
| Subquery-based filters are passed through unchanged — chunking only applies to the materialized list path. | ||
|
|
||
| **Key files:** | ||
| - `apps/worker/services/cleanup/relations.py` — `_chunked_in_filter`, `_get_delete_chunk_size`, and the updated `build_relation_graph` callsite | ||
| - `apps/worker/services/cleanup/tests/test_relations.py` — new tests for passthrough, empty list, split behavior, and integration | ||
|
|
||
| --- | ||
|
|
||
| ## Trade-offs | ||
|
|
||
| **Pros** | ||
| - Eliminates connection drops on large owner deletions | ||
| - Chunk size is tunable via config | ||
| - Only affects the materialized list path; subquery path is unchanged | ||
| - Debug logging when a split occurs, so it's visible in logs | ||
|
|
||
| **Cons** | ||
| - More database round-trips per cleanup run — 50,000 rows that were one DELETE are now five. For very large owners this makes cleanup slower. | ||
| - Deletes within a single model are no longer atomic — if the process crashes between chunks, some rows will be deleted and others won't. This was already a property of the broader cleanup design, not a new regression. | ||
|
|
||
| The trade-off is acceptable: cleanup is a background job where slower-but-reliable is preferable to fast-but-crashes. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated documentation included in Bitbucket OAuth revert PR
Low Severity
The file
docs/adr-or-notes/owner-deletion-chunked-in-filter.mddocuments PR #731 (chunked IN filter for owner deletion), which is entirely unrelated to the Bitbucket OAuth 1.0 → 2.0 revert (PR #772). This documentation file appears to have been accidentally included in the revert PR and would be confusing in the commit history when trying to understand the revert.