Skip to content

perf: persist seniority_level in DB to avoid recomputation on every GET /leads#1

Open
DeryFerd wants to merge 1 commit into
vasu-devs:mainfrom
DeryFerd:perf/persist-seniority-level
Open

perf: persist seniority_level in DB to avoid recomputation on every GET /leads#1
DeryFerd wants to merge 1 commit into
vasu-devs:mainfrom
DeryFerd:perf/persist-seniority-level

Conversation

@DeryFerd

@DeryFerd DeryFerd commented May 6, 2026

Copy link
Copy Markdown

What's the problem?

Every time GET /api/v1/leads gets called, _annotate_job_lead() runs classify_job_seniority() on every single lead in the database. That function parses the title, description, and experience years out of the lead text just to figure out if it's a junior/mid/senior role — and then throws the result away. Next request? Same work all over again.

With a growing leads table this becomes a noticeable overhead, especially since the seniority classification result is deterministic for a given lead. There's no reason to recompute it on every read.

What this PR does

Adds a seniority_level column to the leads table so the classification result gets persisted at write time instead of being recomputed at read time.

Specific changes:

  • db/client.py — New seniority_level TEXT DEFAULT '' column added to the auto-migration list. The column is now part of _LEAD_SELECT_COLUMNS and _lead_row_dict(). save_lead() accepts a new seniority_level parameter and stores it on INSERT. Also added a backfill_seniority_levels() helper that finds leads with an empty seniority_level, classifies them, and writes the result back — useful for upgrading existing databases.

  • main.py — _annotate_job_lead() now checks lead.get("seniority_level") (the DB column value) first, then falls back to source_meta.seniority_level, and only calls classify_job_seniority() if neither has a valid value. The manual lead endpoint now passes seniority_level through to save_lead().

  • agents/scout.py, agents/x_scout.py, agents/free_scout.py — All three save_lead() callers now pass seniority_level from their source_meta dict so the value gets written to the DB at ingestion time.

  • tests/test_regressions.py — New TestSeniorityPersistence class with 5 test cases covering the full priority chain: stored column → source_meta fallback → recomputation fallback, plus edge cases like empty strings and column-vs-meta conflicts.

Migration notes

The seniority_level column is added via the existing ALTER TABLE ADD COLUMN auto-migration, so existing databases will pick it up automatically on next startup. Existing leads will have an empty seniority_level until either:

  1. They're fetched via GET /leads (the fallback recomputation still works, it just won't persist the result automatically from the read path — you'd need to call backfill_seniority_levels() once)
  2. You run backfill_seniority_levels() to bulk-fill all existing leads

New leads ingested through any scout or manual creation path will have seniority_level stored immediately.

Files changed

File What
backend/db/client.py Column migration, select/dict updates, save_lead param, backfill_seniority_levels()
backend/main.py Priority chain fix in _annotate_job_lead, manual lead passes seniority_level
backend/agents/scout.py Passes seniority_level to save_lead
backend/agents/x_scout.py Passes seniority_level to save_lead
backend/agents/free_scout.py Passes seniority_level to save_lead
backend/tests/test_regressions.py 5 new tests in TestSeniorityPersistence

…ET /leads

Previously, classify_job_seniority() was called for every lead on every
GET /api/v1/leads request because the result was never stored. This
adds a seniority_level column to the leads table and persists the
value at save time, with a backfill helper for existing leads.

Changes:
- Add seniority_level TEXT column to leads table (auto-migration)
- Add seniority_level param to save_lead() and persist it on INSERT
- Update _annotate_job_lead to read from DB column first, only
  recompute via classify_job_seniority when value is missing
- Add backfill_seniority_levels() helper for existing data
- Update all save_lead callers (scout, x_scout, free_scout, manual)
  to pass seniority_level from source_meta
- Add TestSeniorityPersistence with 5 test cases covering the
  priority chain: lead column > source_meta > recompute
@DeryFerd DeryFerd force-pushed the perf/persist-seniority-level branch from da6092b to 2edbb48 Compare May 6, 2026 20:24
@vasu-devs vasu-devs self-requested a review as a code owner May 14, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant