The StudyAssessmentAgent evaluates biomedical research publications to assess their quality, methodological rigor, and trustworthiness. This agent provides structured assessments that help researchers and clinicians:
- Understand the reliability of evidence
- Make informed decisions about study quality
- Conduct systematic reviews and meta-analyses
- Grade evidence for clinical practice guidelines
- Identify methodological strengths and limitations
- Study Type Classification: Identifies research design (RCT, cohort, case-control, meta-analysis, etc.)
- Design Characteristics: Detects prospective/retrospective, blinded, randomized, multi-center studies
- Quality Scoring: Provides 0-10 quality score based on methodological rigor
- Bias Risk Assessment: Evaluates 5 types of bias (selection, performance, detection, attrition, reporting)
- Evidence Level Grading: Assigns evidence hierarchy level (Level 1-5)
- Structured Output: Returns assessments in JSON format for integration with other tools
- Batch Processing: Assess multiple studies efficiently
The StudyAssessmentAgent is included in BMLibrarian's core agents module.
from bmlibrarian.agents import StudyAssessmentAgent, StudyAssessmentRequirements:
- Ollama server running (default: http://localhost:11434)
- Recommended model:
gpt-oss:20b(default) - Python >=3.12
from bmlibrarian.agents import StudyAssessmentAgent
# Initialize agent
agent = StudyAssessmentAgent(
model="gpt-oss:20b",
temperature=0.1 # Low temperature for consistent assessments
)
# Prepare document (from your database or file)
document = {
'id': 12345,
'title': 'Efficacy of Drug X in Treating Condition Y: A Randomized Trial',
'abstract': """
Background: Condition Y affects millions worldwide...
Methods: We conducted a randomized, double-blind, placebo-controlled trial...
Results: 150 patients were randomized. Treatment group showed...
Conclusions: Drug X significantly improved outcomes...
""",
'pmid': '98765432',
'doi': '10.1234/example.2023.001'
}
# Assess study quality
assessment = agent.assess_study(document, min_confidence=0.4)
if assessment:
print(f"Study Type: {assessment.study_type}")
print(f"Quality Score: {assessment.quality_score}/10")
print(f"Evidence Level: {assessment.evidence_level}")
print(f"Overall Confidence: {assessment.overall_confidence:.2%}")
print(f"\nStrengths:")
for strength in assessment.strengths:
print(f" - {strength}")
print(f"\nLimitations:")
for limitation in assessment.limitations:
print(f" - {limitation}")# Get human-readable summary
summary = agent.format_assessment_summary(assessment)
print(summary)Example Output:
================================================================================
STUDY QUALITY ASSESSMENT: Efficacy of Drug X in Treating Condition Y: A Randomized Trial
================================================================================
Document ID: 12345
PMID: 98765432
DOI: 10.1234/example.2023.001
--- STUDY CLASSIFICATION ---
Study Type: Randomized Controlled Trial (RCT)
Study Design: Prospective, randomized, double-blinded, placebo-controlled
Evidence Level: Level 1 (high)
Sample Size: N=150 patients
Follow-up: 6 months
Characteristics: Prospective, Randomized, Controlled, Double-blinded
--- QUALITY ASSESSMENT ---
Quality Score: 8.5/10
Overall Confidence: 85.00%
Confidence Explanation: High-quality RCT with appropriate randomization, blinding, and adequate sample size
--- STRENGTHS ---
1. Randomized allocation with proper concealment
2. Double-blinding of participants and assessors
3. Adequate sample size with power calculation
4. Low dropout rate (5%)
5. Validated outcome measures
--- LIMITATIONS ---
1. Single-center study limiting generalizability
2. Short follow-up period (6 months)
3. Predominantly white participants (87%)
--- BIAS RISK ASSESSMENT ---
Selection: low
Performance: low
Detection: low
Attrition: low
Reporting: moderate
================================================================================
# Assess multiple studies
documents = [
{'id': 1, 'title': 'Study 1', 'abstract': '...'},
{'id': 2, 'title': 'Study 2', 'abstract': '...'},
{'id': 3, 'title': 'Study 3', 'abstract': '...'}
]
# Progress callback (optional)
def progress_callback(current, total, doc_title):
print(f"[{current}/{total}] Assessing: {doc_title}")
# Assess batch
assessments = agent.assess_batch(
documents=documents,
min_confidence=0.4,
progress_callback=progress_callback
)
print(f"\nAssessed {len(assessments)} studies successfully")Common classifications:
- Randomized Controlled Trial (RCT): Gold standard for intervention studies
- Cohort study: Observational follow-up of exposed/unexposed groups
- Case-control study: Retrospective comparison of cases and controls
- Cross-sectional study: Single time-point observation
- Case report/series: Descriptive study of individual patients
- Meta-analysis: Statistical synthesis of multiple studies
- Systematic review: Comprehensive literature review with methodology
- 9-10: Exceptional quality, minimal bias risk
- 7-8: High quality, reliable findings
- 5-6: Moderate quality, some limitations
- 3-4: Low quality, significant concerns
- 0-2: Very poor quality, unreliable
Based on Oxford Centre for Evidence-Based Medicine hierarchy:
- Level 1 (high): Systematic reviews of RCTs, high-quality RCTs
- Level 2 (moderate-high): Individual RCTs, systematic reviews of cohorts
- Level 3 (moderate): Cohort studies, case-control studies
- Level 4 (low-moderate): Case series, poor-quality cohort/case-control
- Level 5 (low): Expert opinion, case reports, mechanism-based reasoning
- Selection bias: Systematic differences between comparison groups
- Performance bias: Differences in care/interventions received
- Detection bias: Differences in outcome measurement
- Attrition bias: Systematic differences in withdrawals/dropouts
- Reporting bias: Selective reporting of outcomes
Each rated as: low, moderate, high, or unclear
Agent's confidence in the assessment accuracy:
- 0.9-1.0: Very high confidence, clear methodology description
- 0.7-0.8: High confidence, most details present
- 0.5-0.6: Moderate confidence, some ambiguity
- 0.3-0.4: Low confidence, limited information
- 0.0-0.2: Very low confidence, unclear or insufficient text
# Export for programmatic use
agent.export_to_json(assessments, 'study_assessments.json')JSON Structure:
{
"assessments": [
{
"document_id": "12345",
"document_title": "...",
"study_type": "Randomized Controlled Trial (RCT)",
"study_design": "Prospective, randomized, double-blinded",
"quality_score": 8.5,
"overall_confidence": 0.85,
"evidence_level": "Level 1 (high)",
"strengths": ["..."],
"limitations": ["..."],
"is_randomized": true,
"is_double_blinded": true,
"selection_bias_risk": "low",
...
}
],
"metadata": {
"total_assessments": 50,
"assessment_date": "2023-10-15T14:30:00Z",
"agent_model": "gpt-oss:20b",
"statistics": {
"success_rate": 0.96
}
}
}# Export for spreadsheet analysis
agent.export_to_csv(assessments, 'study_assessments.csv')CSV Columns:
- document_id, document_title, pmid, doi
- study_type, study_design, evidence_level
- quality_score, overall_confidence
- Design flags (is_randomized, is_blinded, is_multi_center, etc.)
- Bias risk scores
- strengths, limitations (semicolon-separated)
stats = agent.get_assessment_stats()
print(f"Total assessments: {stats['total_assessments']}")
print(f"Successful: {stats['successful_assessments']}")
print(f"Success rate: {stats['success_rate']:.2%}")
print(f"Low confidence: {stats['low_confidence_assessments']}")
print(f"Parse failures: {stats['parse_failures']}")distribution = agent.get_quality_distribution(assessments)
for category, count in distribution.items():
print(f"{category}: {count} studies")Example Output:
exceptional (9-10): 5 studies
high (7-8): 23 studies
moderate (5-6): 15 studies
low (3-4): 6 studies
very_low (0-2): 1 studies
evidence_dist = agent.get_evidence_level_distribution(assessments)
for level, count in evidence_dist.items():
print(f"{level}: {count} studies")# Use faster model for preliminary screening
agent = StudyAssessmentAgent(
model="medgemma4B_it_q8:latest", # Faster, less detailed
temperature=0.1,
max_tokens=2000
)
# Use more powerful model for final assessment
agent = StudyAssessmentAgent(
model="gpt-oss:20b", # More thorough, slower
temperature=0.05, # Even more deterministic
max_tokens=3000
)def detailed_progress(current, total, doc_title):
"""Detailed progress reporting"""
percent = (current / total) * 100
print(f"[{percent:.1f}%] ({current}/{total}) Assessing: {doc_title[:60]}...")
assessments = agent.assess_batch(
documents=documents,
progress_callback=detailed_progress
)# Get only high-quality studies (score >= 7)
high_quality = [a for a in assessments if a.quality_score >= 7.0]
print(f"High-quality studies: {len(high_quality)}/{len(assessments)}")
# Get only RCTs
rcts = [a for a in assessments if 'RCT' in a.study_type or 'randomized' in a.study_type.lower()]
print(f"RCTs found: {len(rcts)}")
# Get only Level 1 evidence
level1 = [a for a in assessments if 'Level 1' in a.evidence_level]
print(f"Level 1 evidence: {len(level1)}")# Screen studies for inclusion in systematic review
def meets_quality_criteria(assessment):
"""Check if study meets minimum quality standards"""
return (
assessment.quality_score >= 6.0 and
assessment.overall_confidence >= 0.6 and
'Level 1' in assessment.evidence_level or 'Level 2' in assessment.evidence_level
)
eligible_studies = [a for a in assessments if meets_quality_criteria(a)]
print(f"Eligible for systematic review: {len(eligible_studies)}/{len(assessments)}")# Grade evidence for clinical practice guidelines
def grade_evidence(assessment):
"""Assign GRADE quality rating"""
if assessment.quality_score >= 8 and 'Level 1' in assessment.evidence_level:
return 'HIGH'
elif assessment.quality_score >= 6 and 'Level 2' in assessment.evidence_level:
return 'MODERATE'
elif assessment.quality_score >= 4:
return 'LOW'
else:
return 'VERY LOW'
for assessment in assessments:
grade = grade_evidence(assessment)
print(f"{assessment.document_title[:50]}... - GRADE: {grade}")# Find studies with high impact but methodological concerns
def needs_replication(assessment):
"""Identify studies that should be replicated"""
high_bias_count = sum([
1 for risk in [
assessment.selection_bias_risk,
assessment.performance_bias_risk,
assessment.detection_bias_risk
] if risk in ['high', 'moderate']
])
return (
assessment.quality_score >= 5.0 and # Interesting but not definitive
high_bias_count >= 2 and # Multiple bias concerns
not assessment.is_multi_center # Single center
)
replication_candidates = [a for a in assessments if needs_replication(a)]
print(f"Studies needing replication: {len(replication_candidates)}")-
Use Full Text When Available: Full text provides more complete assessment than abstracts alone
document = { 'id': 123, 'title': '...', 'abstract': '...', 'full_text': '...' # Agent will prefer this }
-
Set Appropriate Confidence Thresholds: Lower thresholds (0.3-0.4) for screening, higher (0.6-0.7) for final inclusion
# Screening phase screening_assessments = agent.assess_batch(documents, min_confidence=0.3) # Final assessment final_assessments = agent.assess_batch(selected_docs, min_confidence=0.6)
-
Verify Critical Assessments: For high-stakes decisions, manually verify agent assessments
if assessment.quality_score >= 8 and 'Level 1' in assessment.evidence_level: print("HIGH-QUALITY STUDY - RECOMMEND MANUAL VERIFICATION") print(agent.format_assessment_summary(assessment))
-
Track and Report Statistics: Monitor assessment quality and success rates
stats = agent.get_assessment_stats() if stats['success_rate'] < 0.9: print(f"WARNING: Success rate {stats['success_rate']:.2%} is below expected")
-
Export for Review: Save assessments for collaborative review and auditing
agent.export_to_json(assessments, f'assessments_{datetime.now().date()}.json') agent.export_to_csv(assessments, f'assessments_{datetime.now().date()}.csv')
- Check Ollama server connectivity
- Verify model is available:
ollama list - Try with longer timeout or more retries
- Ensure documents have sufficient text content
- Decrease temperature (e.g., 0.05) for more deterministic output
- Use more powerful model (gpt-oss:20b instead of smaller models)
- Provide full text instead of abstract only
- The agent automatically retries JSON parsing
- Check Ollama logs for model errors
- Try with different model if persistent
- Normal for case reports and expert opinions (limited methodological detail)
- Consider using abstracts + full text for better context
- Some study types inherently have less structured reporting
from bmlibrarian.agents import DocumentScoringAgent, StudyAssessmentAgent
# First, score documents for relevance
scoring_agent = DocumentScoringAgent()
relevant_docs = [
doc for doc in documents
if scoring_agent.evaluate_document(question, doc) >= 3
]
# Then assess quality of relevant studies
assessment_agent = StudyAssessmentAgent()
assessments = assessment_agent.assess_batch(relevant_docs)
# Filter for high-quality, relevant evidence
high_quality_evidence = [
a for a in assessments
if a.quality_score >= 7.0
]from bmlibrarian.agents import CitationFinderAgent, StudyAssessmentAgent
# Assess studies first
assessment_agent = StudyAssessmentAgent()
assessments = assessment_agent.assess_batch(documents)
# Extract citations only from high-quality studies
citation_agent = CitationFinderAgent()
high_quality_docs = [
doc for doc in documents
if any(a.document_id == str(doc['id']) and a.quality_score >= 7
for a in assessments)
]
citations = citation_agent.process_scored_documents_for_citations(
user_question=question,
scored_documents=[(doc, 5) for doc in high_quality_docs],
score_threshold=3.0
)- PICOAgent User Guide - Extract study components
- Citation System Guide - Extract evidence from studies
- Reporting System Guide - Synthesize evidence into reports
- Developer Documentation - Technical details
For issues or questions:
- Check the troubleshooting section above
- Review examples in
examples/study_assessment_demo.py(when available) - Try the interactive laboratory:
uv run python study_assessment_lab.py(when available) - Report bugs at: https://github.com/hherb/bmlibrarian/issues