You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm pulling SCOTUS opinions by justice using the API (/api/rest/v4/opinions/?author={person_id}). I noticed that opinions from roughly OT2008-OT2015 have author_id=null, with author attribution only in author_str. Pre-2008 and post-2016 opinions are properly linked to person records.
This affects all justices, not just the one I was querying (Clarence Thomas, person 3200). For example:
McDonald v. City of Chicago (2010) — Thomas concurrence (opinion 9435862): author_str="Thomas", author_id=null
A full scan of 2012 opinions found 16 Thomas opinions, all with author_id=null
The pattern suggests this data came from a bulk import (possibly Harvard Caselaw Access Project, based on
date_created timestamps) where sub-opinions were stored with text-only author attribution and never linked to
CourtListener person records.
The author_str text field is populated correctly, so the data is there — it's just not normalized to the person FK. This means any API query filtering by author misses an ~8-year window of opinions.
Not urgent, just flagging in case it's useful for a future normalization pass. Happy to share specific opinion IDs I've identified if helpful.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I'm pulling SCOTUS opinions by justice using the API (/api/rest/v4/opinions/?author={person_id}). I noticed that opinions from roughly OT2008-OT2015 have author_id=null, with author attribution only in author_str. Pre-2008 and post-2016 opinions are properly linked to person records.
This affects all justices, not just the one I was querying (Clarence Thomas, person 3200). For example:
The pattern suggests this data came from a bulk import (possibly Harvard Caselaw Access Project, based on
date_created timestamps) where sub-opinions were stored with text-only author attribution and never linked to
CourtListener person records.
The author_str text field is populated correctly, so the data is there — it's just not normalized to the person FK. This means any API query filtering by author misses an ~8-year window of opinions.
Not urgent, just flagging in case it's useful for a future normalization pass. Happy to share specific opinion IDs I've identified if helpful.
Beta Was this translation helpful? Give feedback.
All reactions