Skip to content

feat(tames) ongoing case scraper#6982

Merged
chaco-fl merged 36 commits intomainfrom
tames/ongoing-scraper
Apr 10, 2026
Merged

feat(tames) ongoing case scraper#6982
chaco-fl merged 36 commits intomainfrom
tames/ongoing-scraper

Conversation

@Brennan-Chesley-FLP
Copy link
Copy Markdown
Contributor

@Brennan-Chesley-FLP Brennan-Chesley-FLP commented Feb 18, 2026

Summary

This PR adds a polling command for new cases on TAMES. It looks at the first page of search results (going backwards in time from the current date) and if it doesn't match the last cached results, it backfills a configurable number of days or cases. After it has done this, it (will) merges the scraped dockets and subscribes to new cases.

Deployment

This PR should:

  • skip-deploy (skips everything below)
    • skip-web-deploy
    • skip-celery-deploy
    • skip-cronjob-deploy
    • skip-daemon-deploy

@Brennan-Chesley-FLP
Copy link
Copy Markdown
Contributor Author

Ran this overnight to verify basic polling behavior (every 15 minutes). I'm not sure how often it happens, but it looks like the website was updated at around midnight. Email notifications didn't get sent out until after 9:30. Might be an interesting thing to track as we add more sites/states.

try:
self._poll_cycle(options, redis, courts, TAMESScraper)
except Exception:
logger.exception("Error during poll cycle")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log the exception as well especially if we're using an except clause this broad.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.exception does this automatically (and should work with Sentry).

options: dict[str, Any],
redis,
courts: list[str] | None,
scraper_class: type,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to make this command more extensible or can we just use TAMESScraper and remove this parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was probably fantasizing about this being reused, so you're right that this is a bit premature.

RateLimitedRequestManager(
**search_rm_args
) as search_request_manager,
RateLimitedRequestManager(**case_rm_args) as case_request_manager,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more clear to pass the kwargs directly here since we're not doing anything with the dictionaries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True!

@Brennan-Chesley-FLP Brennan-Chesley-FLP force-pushed the tames/ongoing-scraper branch 3 times, most recently from e90f154 to 72778e4 Compare March 23, 2026 21:11
@Brennan-Chesley-FLP Brennan-Chesley-FLP changed the base branch from main to feat/tames-subscription-task March 23, 2026 21:12
@Brennan-Chesley-FLP
Copy link
Copy Markdown
Contributor Author

Changed this to target #6978 which it depends upon for case subscription.

@github-actions

This comment has been minimized.

@Brennan-Chesley-FLP Brennan-Chesley-FLP force-pushed the feat/tames-subscription-task branch 2 times, most recently from 324c035 to 48a1fa7 Compare March 25, 2026 17:54
@Brennan-Chesley-FLP Brennan-Chesley-FLP force-pushed the tames/ongoing-scraper branch 2 times, most recently from 99f001f to 937d210 Compare March 25, 2026 18:48
Copy link
Copy Markdown
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, an integration test for tames_poller would be helpful here please. To cover the entire workflow, including cases when new cases are found, when none are found, and the ingestion of new cases into the DB.

I'll do a full review once tests are in place.

@Brennan-Chesley-FLP Brennan-Chesley-FLP force-pushed the feat/tames-subscription-task branch from 48a1fa7 to 7074eaa Compare March 26, 2026 14:16
@semgrep-app

This comment was marked as off-topic.

@albertisfu albertisfu moved this from To Do to In progress in Sprint (Web Team) Apr 8, 2026
Copy link
Copy Markdown
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Brennan-Chesley-FLP just a few additional suggestions, please.

@albertisfu albertisfu moved this from In progress to To Do in Sprint (Web Team) Apr 8, 2026
@Brennan-Chesley-FLP Brennan-Chesley-FLP marked this pull request as ready for review April 9, 2026 20:52
Copy link
Copy Markdown
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work here, @Brennan-Chesley-FLP. This is ready to go. I’ll notify the infra team to merge the PR and apply the migration when they’re ready.

@chaco-fl chaco-fl merged commit 0508058 into main Apr 10, 2026
10 checks passed
@github-project-automation github-project-automation bot moved this from To Do to Done in Sprint (Web Team) Apr 10, 2026
@chaco-fl chaco-fl deleted the tames/ongoing-scraper branch April 10, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants