perf: persist seniority_level in DB to avoid recomputation on every GET /leads#1
Open
DeryFerd wants to merge 1 commit into
Open
perf: persist seniority_level in DB to avoid recomputation on every GET /leads#1DeryFerd wants to merge 1 commit into
DeryFerd wants to merge 1 commit into
Conversation
…ET /leads Previously, classify_job_seniority() was called for every lead on every GET /api/v1/leads request because the result was never stored. This adds a seniority_level column to the leads table and persists the value at save time, with a backfill helper for existing leads. Changes: - Add seniority_level TEXT column to leads table (auto-migration) - Add seniority_level param to save_lead() and persist it on INSERT - Update _annotate_job_lead to read from DB column first, only recompute via classify_job_seniority when value is missing - Add backfill_seniority_levels() helper for existing data - Update all save_lead callers (scout, x_scout, free_scout, manual) to pass seniority_level from source_meta - Add TestSeniorityPersistence with 5 test cases covering the priority chain: lead column > source_meta > recompute
da6092b to
2edbb48
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's the problem?
Every time
GET /api/v1/leadsgets called, _annotate_job_lead() runs classify_job_seniority() on every single lead in the database. That function parses the title, description, and experience years out of the lead text just to figure out if it's a junior/mid/senior role — and then throws the result away. Next request? Same work all over again.With a growing leads table this becomes a noticeable overhead, especially since the seniority classification result is deterministic for a given lead. There's no reason to recompute it on every read.
What this PR does
Adds a
seniority_levelcolumn to the leads table so the classification result gets persisted at write time instead of being recomputed at read time.Specific changes:
db/client.py — New
seniority_level TEXT DEFAULT ''column added to the auto-migration list. The column is now part of_LEAD_SELECT_COLUMNSand _lead_row_dict(). save_lead() accepts a newseniority_levelparameter and stores it onINSERT. Also added a backfill_seniority_levels() helper that finds leads with an emptyseniority_level, classifies them, and writes the result back — useful for upgrading existing databases.main.py — _annotate_job_lead() now checks lead.get("seniority_level") (the DB column value) first, then falls back to
source_meta.seniority_level, and only calls classify_job_seniority() if neither has a valid value. The manual lead endpoint now passesseniority_levelthrough to save_lead().agents/scout.py, agents/x_scout.py, agents/free_scout.py — All three save_lead() callers now pass
seniority_levelfrom theirsource_metadict so the value gets written to the DB at ingestion time.tests/test_regressions.py — New TestSeniorityPersistence class with 5 test cases covering the full priority chain: stored column → source_meta fallback → recomputation fallback, plus edge cases like empty strings and column-vs-meta conflicts.
Migration notes
The
seniority_levelcolumn is added via the existingALTER TABLE ADD COLUMNauto-migration, so existing databases will pick it up automatically on next startup. Existing leads will have an emptyseniority_leveluntil either:GET /leads(the fallback recomputation still works, it just won't persist the result automatically from the read path — you'd need to call backfill_seniority_levels() once)New leads ingested through any scout or manual creation path will have
seniority_levelstored immediately.Files changed
seniority_levelseniority_levelto save_leadseniority_levelto save_leadseniority_levelto save_lead