CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

BMLibrarian is a comprehensive Python library providing AI-powered access to biomedical literature databases. It features a multi-agent architecture with specialized agents for query processing, document scoring, citation extraction, report generation, and counterfactual analysis, all coordinated through an advanced task queue orchestration system.

The project includes a modern modular CLI (bmlibrarian_cli.py) that provides full multi-agent workflow capabilities with enhanced maintainability and extensibility.

Dependencies and Environment

Python: Requires Python >=3.12
Database: PostgreSQL with pgvector extension for semantic search
AI/LLM: Ollama for local language model inference
Main dependencies:
- psycopg >=3.2.9 for PostgreSQL connectivity (via DatabaseManager)
- ollama - Python library for Ollama LLM communication (never use raw HTTP requests)
- PySide6 for GUI applications
Package manager: Uses uv for dependency management (uv.lock present)

Configuration

Configuration file locations (OS agnostic):
- Primary: ~/.bmlibrarian/config.json (recommended)
- Legacy fallback: bmlibrarian_config.json in current directory
- GUI default: Always saves to ~/.bmlibrarian/config.json
Environment variables are configured in .env file
Database connection parameters:
- POSTGRES_DB: Database name (default: "knowledgebase")
- POSTGRES_USER: Database user
- POSTGRES_PASSWORD: Database password
- POSTGRES_HOST: Database host (default: "localhost")
- POSTGRES_PORT: Database port (default: "5432")
File system:
- PDF_BASE_DIR: Base directory for PDF files (default: "~/knowledgebase/pdf")
AI/LLM configuration:
- Ollama service typically runs on http://localhost:11434
- Models used: gpt-oss:20b (default for complex tasks), medgemma4B_it_q8:latest (fast processing)
OpenAthens proxy authentication (optional):
- Enable institutional access to paywalled PDFs via OpenAthens proxy
- Supports 2FA authentication with persistent sessions (24 hours default)
- Configure in config.json under "openathens" section
- Requires Playwright: uv add playwright && uv run python -m playwright install chromium
- See: doc/OPENATHENS_QUICKSTART.md and doc/users/openathens_guide.md

Database-Backed User Settings

BMLibrarian supports database-backed settings for authenticated users:

User Authentication: Login system with username/password stored in public.users table
Per-User Settings: Each user has personalized settings in bmlsettings.user_settings
Default Settings: System-wide defaults in bmlsettings.default_settings
Session Management: Session tokens for persistent authentication

Resolution Priority Order:

User's database settings (when authenticated)
Default database settings (if DB connected)
JSON file settings (~/.bmlibrarian/config.json)
Hardcoded DEFAULT_CONFIG

Valid Settings Categories: models, ollama, agents, database, search, query_generation, gui, openathens, pdf, general

Key Components:

UserService: User registration, authentication, session management
UserSettingsManager: Per-user settings CRUD operations
BMLibrarianConfig.set_user_context(): Enables database-backed settings
migrate_config_to_db.py: CLI tool for migrating JSON configs to database

Documentation:

User guide: doc/users/settings_migration_guide.md
Architecture: doc/developers/db_settings_architecture.md
Planning: doc/planning/db_settings_refactor_plan.md

Development Commands

Since this project uses uv for package management:

uv sync - Install/sync dependencies
uv run python -m [module] - Run Python modules in the virtual environment
Testing: uv run python -m pytest tests/ - Run comprehensive test suite
Database Setup & Battle Testing:
- uv run python initial_setup_and_download.py test_database.env - Complete database setup and import testing
- uv run python initial_setup_and_download.py test.env --skip-medrxiv --skip-pubmed - Schema setup only
- uv run python initial_setup_and_download.py test.env --medrxiv-days 1 --pubmed-max-results 10 - Quick validation test
- See SETUP_GUIDE.md for comprehensive documentation
CLI Applications:
- uv run python bmlibrarian_cli.py - Interactive medical research CLI with full multi-agent workflow
- uv run python fact_checker_cli.py statements.json - Batch fact-checker for biomedical statements (stores in PostgreSQL factcheck schema)
- uv run python fact_checker_cli.py statements.json --incremental - Incremental mode (resume processing, skip already-evaluated statements)
- uv run python medrxiv_import_cli.py update --download-pdfs - Import medRxiv preprints with multi-format full-text extraction
- uv run python medrxiv_import_cli.py update --extraction-strategy auto - Import with priority-based extraction (text → HTML → XML → PDF)
- uv run python medrxiv_import_cli.py extract-text --missing-only --limit 100 - Re-extract full text for papers without it
- uv run python medrxiv_import_cli.py fetch-pdfs --limit 100 - Download missing PDFs for existing records
- uv run python medrxiv_import_cli.py status - Show medRxiv import statistics with extraction strategy info
- uv run python medrxiv_meca_cli.py list --limit 10 - List available MECA packages from AWS S3 (requires boto3)
- uv run python medrxiv_meca_cli.py download --limit 10 --output-dir ~/medrxiv_meca - Download MECA packages
- uv run python medrxiv_meca_cli.py sync --output-dir ~/medrxiv_meca --limit 100 - Full MECA workflow (download + import)
- uv run python pubmed_import_cli.py search "COVID-19 vaccine" --max-results 100 - Import PubMed articles by search query (targeted import)
- uv run python pubmed_import_cli.py pmids 12345678 23456789 - Import PubMed articles by PMID list
- uv run python pubmed_import_cli.py status - Show PubMed import statistics
- uv run python pubmed_bulk_cli.py download-baseline - Download complete PubMed baseline (~38M articles, ~400GB, for offline mirroring)
- uv run python pubmed_bulk_cli.py download-updates - Download PubMed daily update files (new articles + metadata updates)
- uv run python pubmed_bulk_cli.py import --type baseline - Import downloaded baseline files into database (with Markdown abstract formatting)
- uv run python pubmed_bulk_cli.py sync --updates-only - Download and import PubMed updates (incremental sync)
- uv run python pubmed_bulk_cli.py status - Show PubMed bulk download/import status
- uv run python pubmed_repair_cli.py scan - Scan downloaded PubMed files for gzip corruption
- uv run python pubmed_repair_cli.py scan --type update - Scan only update files for corruption
- uv run python pubmed_repair_cli.py repair --reimport -y - Re-download corrupted files and re-import to database
- Note: PubMed bulk importer now preserves abstract structure and formatting as Markdown (section labels, subscripts, superscripts, emphasis)
- uv run python pmc_bulk_cli.py list --license oa_comm - List available PMC Open Access packages
- uv run python pmc_bulk_cli.py download --license oa_comm - Download PMC baseline packages (with PDF + full-text NXML)
- uv run python pmc_bulk_cli.py download --delay 300 - Download with 5-minute delay (polite mode)
- uv run python pmc_bulk_cli.py download --range PMC001xxxxxx - Download specific PMCID range only
- uv run python pmc_bulk_cli.py extract - Extract downloaded tar.gz packages
- uv run python pmc_bulk_cli.py import - Import extracted articles to database
- uv run python pmc_bulk_cli.py sync --license oa_comm - Full workflow: download + extract + import
- uv run python pmc_bulk_cli.py status - Show PMC bulk download progress
- uv run python pmc_bulk_cli.py estimate - Estimate download time and storage requirements
- Note: PMC bulk importer is designed for offline work with configurable rate limiting (default 2 min between files)
- uv run python europe_pmc_bulk_cli.py list - List available Europe PMC Open Access packages (~1000+ files, ~100 articles each)
- uv run python europe_pmc_bulk_cli.py download --output-dir ~/europepmc - Download Europe PMC full-text XML with resumable progress
- uv run python europe_pmc_bulk_cli.py download --delay 120 --limit 10 - Download with 2-minute delay, limited to 10 packages
- uv run python europe_pmc_bulk_cli.py download --range 1-1000000 - Download specific PMCID range only
- uv run python europe_pmc_bulk_cli.py verify --output-dir ~/europepmc - Verify gzip integrity of all downloaded files
- uv run python europe_pmc_bulk_cli.py status - Show Europe PMC download progress
- uv run python europe_pmc_bulk_cli.py estimate - Estimate remaining download time
- uv run python europe_pmc_bulk_cli.py import --output-dir ~/europepmc - Import downloaded packages to database with Markdown full-text
- uv run python europe_pmc_bulk_cli.py import --limit 5 --batch-size 50 - Import with limits
- uv run python europe_pmc_bulk_cli.py import-status - Show import progress and statistics
- uv run python europe_pmc_bulk_cli.py verify-import --package PMC13900_PMC17829.xml.gz - Verify package can be parsed before import
- Note: Europe PMC importer converts JATS XML to Markdown with proper headers, figure placeholders, and emphasis formatting
- uv run python europe_pmc_pdf_cli.py list - List available Europe PMC PDF packages
- uv run python europe_pmc_pdf_cli.py download --output-dir ~/europepmc_pdf - Download Europe PMC PDFs with resumable progress
- uv run python europe_pmc_pdf_cli.py download --delay 120 --limit 10 - Download with 2-minute delay, limited to 10 packages
- uv run python europe_pmc_pdf_cli.py download --range 1-1000000 - Download specific PMCID range only
- uv run python europe_pmc_pdf_cli.py download --no-extract - Download packages without extracting PDFs
- uv run python europe_pmc_pdf_cli.py verify --output-dir ~/europepmc_pdf - Verify integrity of downloaded packages
- uv run python europe_pmc_pdf_cli.py extract --output-dir ~/europepmc_pdf - Extract PDFs from downloaded packages
- uv run python europe_pmc_pdf_cli.py status - Show Europe PMC PDF download progress
- uv run python europe_pmc_pdf_cli.py estimate - Estimate remaining download time
- uv run python europe_pmc_pdf_cli.py find --pmcid PMC123456 - Find a specific PDF by PMCID in local storage
- Note: Europe PMC PDF downloader extracts PDFs to year-based subdirectories for organization
- uv run python embed_documents_cli.py embed --source medrxiv --limit 100 - Generate embeddings for medRxiv abstracts
- uv run python embed_documents_cli.py count --source medrxiv - Count documents needing embeddings
- uv run python embed_documents_cli.py status - Show embedding statistics
- uv run python mesh_import_cli.py import --year 2025 - Download and import MeSH vocabulary (~400MB with SCRs)
- uv run python mesh_import_cli.py import --year 2025 --no-supplementary - Import MeSH without SCRs (~180MB, faster)
- uv run python mesh_import_cli.py status - Show MeSH database statistics and local DB availability
- uv run python mesh_import_cli.py lookup "heart attack" - Look up a MeSH term (uses local DB if available)
- uv run python mesh_import_cli.py search "cardio" - Search MeSH by partial match
- uv run python mesh_import_cli.py expand "MI" - Expand term to all synonyms/entry terms
- uv run python mesh_import_cli.py history - Show MeSH import history
- uv run python pdf_import_cli.py file /path/to/paper.pdf - Import single PDF with LLM-based metadata extraction and database matching
- uv run python pdf_import_cli.py directory /path/to/pdfs/ - Import directory of PDFs with intelligent matching
- uv run python pdf_import_cli.py directory /pdfs/ --recursive - Import PDFs recursively from subdirectories
- uv run python pdf_import_cli.py file paper.pdf --dry-run - Preview import without making changes
- uv run python pdf_import_cli.py status - Show PDF import statistics and coverage
- uv run python fact_checker_cli.py statements.json -o results.json - Export results to JSON file (PostgreSQL is always used)
- uv run python fact_checker_stats.py - Generate comprehensive statistical analysis report (console output)
- uv run python fact_checker_stats.py --export-csv stats_output/ - Export statistics to CSV files
- uv run python fact_checker_stats.py --export-csv stats_output/ --plot - Create visualization plots
- uv run python paper_checker_cli.py abstracts.json - PaperChecker: fact-check medical abstracts against literature
- uv run python paper_checker_cli.py abstracts.json -o results.json - Export PaperChecker results to JSON
- uv run python paper_checker_cli.py abstracts.json --export-markdown reports/ - Export markdown reports per abstract
- uv run python paper_checker_cli.py --pmid 12345678 23456789 - Check abstracts by PMID from database
- uv run python paper_checker_cli.py abstracts.json --quick - Quick test mode (max 5 abstracts)
- uv run python paper_checker_cli.py abstracts.json --continue-on-error - Continue processing on failures
- uv run python model_benchmark_cli.py benchmark "research question" --models gpt-oss:20b medgemma4B_it_q8:latest - Benchmark document scoring models
- uv run python model_benchmark_cli.py benchmark "question" --models gpt-oss:20b --authoritative gpt-oss:120B -o results.json - Benchmark with custom authoritative model and export
- uv run python model_benchmark_cli.py history - View benchmark run history
- uv run python model_benchmark_cli.py show --run-id 5 - View specific benchmark run details
- uv run python model_benchmark_cli.py compare --run-id 5 - Compare score distributions between models
- uv run python migrate_config_to_db.py --interactive - Interactive settings migration wizard
- uv run python migrate_config_to_db.py --user alice --config ~/.bmlibrarian/config.json - Migrate JSON config to user database settings
- uv run python migrate_config_to_db.py --defaults --config config.json - Set default settings (admin)
- uv run python migrate_config_to_db.py --export --user alice -o backup.json - Export user settings to JSON
- uv run python export_to_pdf.py report.md -o report.pdf - Export markdown to PDF with default settings
- uv run python export_to_pdf.py report.md -o report.pdf --title "Research" --author "Dr. Smith" - Export with custom metadata
- uv run python export_to_pdf.py report.md -o report.pdf --research-report --citation-count 45 - Export as BMLibrarian research report
- uv run python export_to_pdf.py report.md -o report.pdf --letter --font-size 12 - Export with US Letter format and custom font size
GUI Applications:
- uv run python setup_wizard.py - PySide6 setup wizard for initial database configuration and data import
- uv run python bmlibrarian_research_gui.py - Desktop research application with visual workflow progress and report preview
- uv run python bmlibrarian_config_gui.py - Graphical configuration interface for agents and settings
- uv run python fact_checker_review_gui.py - Human review and annotation interface for fact-checking results (PostgreSQL-based)
- uv run python fact_checker_review_gui.py --user alice - Launch review GUI with username (skip login dialog)
- uv run python fact_checker_review_gui.py --user alice --incremental - Incremental mode (only show unannotated statements)
- uv run python fact_checker_review_gui.py --user bob --blind - Blind mode (hide AI/original annotations for unbiased review)
- uv run python fact_checker_review_gui.py --user alice --db-file review_package.db - Review with SQLite package (no PostgreSQL needed)
- uv run python audit_validation_gui.py - Audit trail validation GUI for human review of automated evaluations
- uv run python audit_validation_gui.py --user alice - Launch with specified reviewer name
- uv run python audit_validation_gui.py --user alice --incremental - Show only unvalidated items
- uv run python systematic_review_gui.py - Checkpoint-based systematic review GUI with resume capability
- uv run python systematic_review_gui.py --review-dir ~/my_reviews - Start with specific review directory
- uv run python systematic_review_gui.py --debug - Enable debug logging
BMLibrarian Lite (lightweight version without PostgreSQL):
- Available as a separate project: https://github.com/hherb/bmlibrarian_lite
- Features: ChromaDB + SQLite storage, FastEmbed for local embeddings, Anthropic Claude API, PubMed search
Fact-Checker Distribution Tools (for inter-rater reliability analysis):
- uv run python scripts/export_review_package.py --output review_package.db --exported-by username - Export self-contained SQLite review package
- uv run python scripts/export_human_evaluations.py --db-file review.db --annotator alice -o alice.json - Export human annotations to JSON
- uv run python scripts/import_human_evaluations.py alice.json bob.json charlie.json - Re-import human evaluations to PostgreSQL
Laboratory Tools:
- uv run python scripts/query_lab.py - Interactive QueryAgent laboratory for experimenting with natural language to PostgreSQL query conversion
- uv run python scripts/pico_lab.py - Interactive PICO laboratory for extracting Population, Intervention, Comparison, and Outcome components from documents
- uv run python scripts/study_assessment_lab.py - Interactive Study Assessment laboratory for evaluating research quality and trustworthiness
- uv run python scripts/prisma2020_lab.py - Interactive PRISMA 2020 laboratory for assessing systematic review compliance with PRISMA reporting guidelines
- uv run python scripts/paper_weight_lab.py - Interactive Paper Weight Assessment laboratory (PySide6/Qt) for evaluating evidential weight of research papers
- uv run python scripts/paper_checker_lab.py - Interactive PaperChecker laboratory (PySide6/Qt) for medical abstract fact-checking with step-by-step visualization
- uv run python scripts/paper_reviewer_lab.py - Interactive Paper Reviewer laboratory (PySide6/Qt) for comprehensive paper assessment with DOI/PMID/PDF/text input
- uv run python scripts/pubmed_search_lab.py - Interactive PubMed Search laboratory (PySide6/Qt) for searching PubMed API without local database storage
- uv run python scripts/transparency_lab.py - Interactive Transparency Assessment laboratory (PySide6/Qt) for detecting undisclosed bias risk in biomedical papers
Transparency Assessment Tools:
- uv run python transparency_analyzer_cli.py assess --doc-id 12345 - Assess transparency of a single document by ID
- uv run python transparency_analyzer_cli.py assess --query "cardiovascular exercise" --limit 50 - Assess documents matching a search query
- uv run python transparency_analyzer_cli.py assess --has-fulltext --limit 100 - Assess documents with full text available
- uv run python transparency_analyzer_cli.py stats - Show transparency assessment statistics
- uv run python transparency_analyzer_cli.py show --doc-id 12345 - Show detailed assessment for a document
- uv run python transparency_analyzer_cli.py export --output results.json - Export assessments to JSON
- uv run python transparency_analyzer_cli.py export --output results.csv - Export assessments to CSV
- uv run python clinicaltrials_import_cli.py download --output-dir ~/clinicaltrials - Download ClinicalTrials.gov bulk data (~10GB)
- uv run python clinicaltrials_import_cli.py import --input-dir ~/clinicaltrials - Import ClinicalTrials.gov trials to database
- uv run python clinicaltrials_import_cli.py status - Show ClinicalTrials.gov import statistics
- uv run python retraction_watch_cli.py import --file retraction_watch.csv - Import Retraction Watch CSV data
- uv run python retraction_watch_cli.py lookup --doi 10.1234/example - Look up retraction status by DOI
- uv run python retraction_watch_cli.py status - Show Retraction Watch import statistics
- Note: Transparency assessment works fully offline using local Ollama models and documents in the database
PDF Processing Tools:
- uv run python examples/pdf_processor_demo.py - PySide6 demo application for PDF section segmentation (biomedical publications)
- uv run python tests/test_pdf_processor.py paper.pdf - Command-line test script for PDF processor library
Demonstrations:
- uv run python examples/agent_demo.py - Multi-agent workflow demonstration
- uv run python examples/citation_demo.py - Citation extraction examples
- uv run python examples/reporting_demo.py - Report generation examples
- uv run python examples/counterfactual_demo.py - Counterfactual analysis demonstration

OpenAthens Authentication

BMLibrarian includes secure OpenAthens authentication for accessing institutional journal subscriptions:

Key Features

Secure Session Management: JSON-based storage with 600 file permissions (no pickle vulnerability)
Browser Automation: Interactive login via Playwright
Cookie-Based Authentication: Automatic cookie injection for authenticated downloads
Session Validation Caching: Performance optimization with configurable TTL
HTTPS Enforcement: All institutional URLs must use HTTPS
Network Connectivity Checks: Pre-authentication validation

Security Improvements Implemented

JSON Serialization: Replaced pickle to eliminate code execution vulnerability
File Permissions: Session files stored with 600 permissions (owner read/write only)
Cookie Pattern Matching: Specific regex patterns for OpenAthens/SAML/Shibboleth cookies
Configurable Parameters: No magic numbers, all timeouts/intervals configurable
URL Validation: HTTPS requirement and format validation
Browser Crash Handling: Graceful cleanup on browser failures
Session Cache TTL: Reduces validation overhead during batch downloads

Usage

from bmlibrarian.utils.openathens_auth import OpenAthensConfig, OpenAthensAuth
from bmlibrarian.utils.pdf_manager import PDFManager

# Configure OpenAthens
config = OpenAthensConfig(
    institution_url='https://institution.openathens.net/login',
    session_max_age_hours=24,
    session_cache_ttl=60
)

# Authenticate (interactive browser login)
auth = OpenAthensAuth(config)
import asyncio
asyncio.run(auth.login_interactive())

# Use with PDFManager for authenticated downloads
pdf_manager = PDFManager(openathens_auth=auth)
pdf_path = pdf_manager.download_pdf(document)

Documentation

User Guide: doc/users/openathens_guide.md - Complete usage guide with examples
Security Documentation: doc/developers/openathens_security.md - Security architecture and best practices
Unit Tests: tests/test_openathens_auth.py - Comprehensive test suite

PDF Discovery and Download

BMLibrarian includes an intelligent PDF retrieval system that discovers and downloads full-text PDFs using multiple strategies:

Workflow

Discovery: Finds available PDF sources via PMC, Unpaywall, DOI, and direct URLs
Direct HTTP Download: Attempts fast HTTP downloads from discovered sources (prioritizes open access)
Browser Fallback: Uses Playwright browser automation for Cloudflare-protected or anti-bot protected sites

Key Features

Multi-Source Discovery: Searches PMC, Unpaywall, CrossRef, and DOI.org
Priority-Based Selection: Automatically selects best source (open access preferred)
Browser Fallback: Handles Cloudflare verification, embedded PDF viewers, and anti-bot protections
Year-Based Organization: PDFs stored in YYYY/filename.pdf structure
Database Integration: Automatically updates document records with PDF paths

Usage

from bmlibrarian.discovery import download_pdf_for_document
from pathlib import Path

# Download PDF with discovery workflow
result = download_pdf_for_document(
    document={'doi': '10.1038/nature12373', 'id': 123},
    output_dir=Path('~/pdfs').expanduser(),
    unpaywall_email='user@example.com',  # Recommended
    use_browser_fallback=True  # Falls back to browser if HTTP fails
)

if result.success:
    print(f"Downloaded: {result.file_path}")
    print(f"Source: {result.source.source_type.value}")

Configuration

{
  "unpaywall_email": "user@example.com",
  "discovery": {
    "timeout": 30,
    "prefer_open_access": true,
    "use_browser_fallback": true,
    "browser_headless": true,
    "browser_timeout": 60000
  }
}

Documentation

User Guide: doc/users/pdf_download_guide.md - Complete usage guide
Browser Downloader: doc/users/BROWSER_DOWNLOADER.md - Browser-based download details

PDF Export System

BMLibrarian includes a professional PDF export system for converting markdown-formatted research reports to publication-quality PDF documents.

Key Features

Pure Python: Uses ReportLab (BSD-licensed, free for redistribution)
Cross-Platform: Works on all major operating systems without system dependencies
Professional Quality: Publication-ready documents with proper formatting
Full Markdown Support: Headings, lists, tables, code blocks, links, emphasis
Custom Styling: Configurable fonts, colors, page sizes, and margins
Page Management: Automatic page numbers, headers, footers, and timestamps

Quick Start

from pathlib import Path
from bmlibrarian.exporters import PDFExporter

# Create exporter with default settings
exporter = PDFExporter()

# Export markdown report to PDF
exporter.export_report(
    report_content=final_report,  # Markdown-formatted report
    output_path=Path("research_report.pdf"),
    research_question="What are the cardiovascular benefits of exercise?",
    citation_count=45,
    document_count=128
)

Command-Line Tool

# Basic export (uses A4 paper size by default - international standard)
uv run python export_to_pdf.py report.md -o report.pdf

# With metadata and US Letter format
uv run python export_to_pdf.py report.md -o report.pdf \
    --title "Research Report" \
    --author "Dr. Smith" \
    --letter --font-size 12

# Research report format with metadata
uv run python export_to_pdf.py report.md -o report.pdf \
    --research-report \
    --citation-count 45 \
    --document-count 128

Configuration Options

from bmlibrarian.exporters import PDFExportConfig
from reportlab.lib.pagesizes import A4

config = PDFExportConfig(
    page_size=A4,  # or letter (default)
    base_font_size=12,
    heading_color=(0.1, 0.1, 0.3),  # RGB
    include_page_numbers=True,
    include_timestamp=True,
    include_header=True
)

exporter = PDFExporter(config)

Dependencies

reportlab (BSD License): Core PDF generation
markdown (BSD License): Markdown parsing
Pygments (BSD License): Syntax highlighting

All dependencies are free for commercial use and redistribution with no licensing fees.

Documentation

User Guide: doc/users/pdf_export_guide.md - Complete usage guide with examples
API Reference: See src/bmlibrarian/exporters/pdf_exporter.py for full API documentation

Architecture

BMLibrarian uses a sophisticated multi-agent architecture with enum-based workflow orchestration:

Core Components

Multi-Agent System: Specialized AI agents for different literature analysis tasks
Enum-Based Workflow: Flexible step orchestration with meaningful names and repeatable steps
Task Queue Orchestration: SQLite-based queue system for memory-efficient processing
Database Backend: PostgreSQL with pgvector extension for semantic search
Local LLM Integration: Ollama service for privacy-preserving AI inference

Agent Types

QueryAgent: Natural language to PostgreSQL query conversion
DocumentScoringAgent: Relevance scoring for user questions (1-5 scale)
CitationFinderAgent: Extracts relevant passages from high-scoring documents
ReportingAgent: Synthesizes citations into medical publication-style reports
CounterfactualAgent: Analyzes documents to generate research questions for finding contradictory evidence
EditorAgent: Creates balanced comprehensive reports integrating all evidence
FactCheckerAgent: Evaluates biomedical statements (yes/no/maybe) with literature evidence for training data auditing
PICOAgent: Extracts Population, Intervention, Comparison, and Outcome components from research papers for systematic reviews
StudyAssessmentAgent: Evaluates research quality, study design, methodological rigor, bias risk, and trustworthiness of biomedical evidence
PRISMA2020Agent: Assesses systematic reviews and meta-analyses against PRISMA 2020 reporting guidelines (27-item checklist with suitability pre-screening)
PaperCheckerAgent: Validates medical abstract claims against contradictory literature using multi-strategy search (semantic + HyDE + keyword) with evidence-based verdicts (supports/contradicts/undecided)
TransparencyAgent: Detects undisclosed bias risk by assessing funding disclosure, conflict of interest, data availability, trial registration, and author contributions using offline LLM analysis with bulk metadata enrichment (PubMed grants, ClinicalTrials.gov sponsors, Retraction Watch)

Document Card Factory System

BMLibrarian uses a factory pattern for creating document cards with consistent functionality in Qt/PySide6.

Key Features:

Three-State PDF Buttons: VIEW (local PDF), FETCH (download from URL), UPLOAD (manual upload)
Consistent Styling: Unified appearance using centralized stylesheet system
Context-Aware Rendering: Different card styles for literature, scoring, citations, etc.
Extensible Design: Easy to add new card variations or contexts

Factory Classes:

DocumentCardFactoryBase: Abstract base class with common utilities
QtDocumentCardFactory: Qt-specific implementation with QFrame cards and integrated PDF buttons

Usage Example:

from bmlibrarian.gui.qt.qt_document_card_factory import QtDocumentCardFactory
from bmlibrarian.gui.qt.document_card_factory_base import DocumentCardData, CardContext

factory = QtDocumentCardFactory(parent=parent_widget)
card_data = DocumentCardData(
    doc_id=12345,
    title="Example Study",
    authors=["Smith J", "Johnson A"],
    year=2023,
    relevance_score=4.5,
    pdf_url="https://example.com/paper.pdf",
    context=CardContext.LITERATURE,
    show_pdf_button=True
)
card = factory.create_card(card_data)

PDF Button States:

VIEW (Blue): Local PDF exists → Opens in viewer
FETCH (Orange): URL available → Downloads then transitions to VIEW
UPLOAD (Green): No PDF → File picker then transitions to VIEW

See Documentation:

Developer guide: doc/developers/document_card_factory_system.md
Demo: examples/document_card_factory_demo.py

Multi-Model Query Generation

BMLibrarian supports using multiple AI models to generate diverse database queries for improved document retrieval. This feature leverages the strengths of different models to create query variations that often find more relevant literature than single-model approaches.

Key Features:

Query Diversity: Generate 1-3 queries per model using up to 3 different models
Improved Coverage: Typically finds 20-40% more relevant documents
Serial Execution: Simple serial processing optimized for local Ollama + PostgreSQL instances
Automatic De-duplication: Query and document ID de-duplication handled automatically
Backward Compatible: Feature flag system (disabled by default)

Configuration (~/.bmlibrarian/config.json):

{
  "query_generation": {
    "multi_model_enabled": true,
    "models": [
      "medgemma-27b-text-it-Q8_0:latest",
      "gpt-oss:20b",
      "medgemma4B_it_q8:latest"
    ],
    "queries_per_model": 1,
    "execution_mode": "serial",
    "deduplicate_results": true,
    "show_all_queries_to_user": true,
    "allow_query_selection": true
  }
}

Architecture Highlights:

Serial Execution: Simple for-loops (not parallel) prevent resource bottlenecks with local instances
ID-Only Queries: Fast document ID retrieval (~10x faster) followed by single bulk document fetch
Type-Safe Results: Dataclasses (QueryGenerationResult, MultiModelQueryResult) for all query results
Error Resilience: Model failures handled gracefully, system continues with available models

Performance:

Overhead: ~2-3x slower than single-model (typically 5-15 seconds vs 2-5 seconds)
Benefit: 20-40% more relevant documents with 2-3 models
Recommended: Start with 2 models, 1 query each for best balance

See Documentation:

User guide: doc/users/multi_model_query_guide.md
Technical docs: doc/developers/multi_model_architecture.md

Workflow Orchestration System

The new enum-based workflow system (workflow_steps.py) provides:

WorkflowStep Enum: Meaningful step names instead of brittle numbering
Repeatable Steps: Query refinement, threshold adjustment, citation requests
Branching Logic: Conditional step execution and error recovery
Context Management: State preservation across step executions
Auto Mode Support: Graceful handling of non-interactive execution

Workflow Steps

COLLECT_RESEARCH_QUESTION → GENERATE_AND_EDIT_QUERY → SEARCH_DOCUMENTS → 
REVIEW_SEARCH_RESULTS → SCORE_DOCUMENTS → EXTRACT_CITATIONS → 
GENERATE_REPORT → PERFORM_COUNTERFACTUAL_ANALYSIS → 
SEARCH_CONTRADICTORY_EVIDENCE → EDIT_COMPREHENSIVE_REPORT → 
REVIEW_AND_REVISE_REPORT → EXPORT_REPORT

Iterative Capabilities

Query Refinement: When search results are insufficient
Threshold Adjustment: For better citation extraction
Citation Requests: Agents can request more evidence during report generation
Report Revision: Iterative improvement of generated reports
Evidence Enhancement: Counterfactual analysis for finding contradictory studies

Queue System

QueueManager: SQLite-based persistent task queuing
AgentOrchestrator: Coordinates multi-agent workflows
WorkflowExecutor: Manages step execution with context tracking
Task Priorities: HIGH, NORMAL, LOW priority levels
Batch Processing: Memory-efficient handling of large document sets

Project Structure

bmlibrarian/
├── src/bmlibrarian/           # Main source code
│   ├── agents/                # Multi-agent system
│   │   ├── __init__.py        # Agent module exports
│   │   ├── base.py            # BaseAgent foundation class
│   │   ├── query_agent.py     # Natural language query processing
│   │   ├── scoring_agent.py   # Document relevance scoring
│   │   ├── citation_agent.py  # Citation extraction from documents
│   │   ├── reporting_agent.py # Report synthesis and formatting
│   │   ├── counterfactual_agent.py # Counterfactual analysis for contradictory evidence
│   │   ├── editor_agent.py    # Comprehensive report editing and integration
│   │   ├── queue_manager.py   # SQLite-based task queue system
│   │   ├── orchestrator.py    # Multi-agent workflow coordination
│   │   ├── transparency_data.py # Transparency assessment data models and constants
│   │   ├── transparency_agent.py # TransparencyAgent for undisclosed bias risk detection
│   │   └── query_generation/  # Multi-model query generation system
│   │       ├── __init__.py    # Query generation module exports
│   │       ├── data_types.py  # Type-safe dataclasses for query results
│   │       └── generator.py   # Multi-model query generator
│   ├── importers/             # External data source importers
│   │   ├── __init__.py        # Importer module exports
│   │   ├── medrxiv_importer.py # MedRxiv preprint importer (multi-format extraction)
│   │   ├── medrxiv_content_extractor.py # Multi-format content extractor (text/HTML/XML/PDF)
│   │   ├── medrxiv_meca_importer.py # MedRxiv MECA bulk importer (AWS S3)
│   │   ├── pubmed_importer.py # PubMed E-utilities importer (targeted imports)
│   │   ├── pubmed_bulk_importer.py # PubMed FTP bulk importer (complete mirror)
│   │   ├── pdf_matcher.py     # LLM-based PDF matching and import (DOI/PMID/title matching)
│   │   ├── clinicaltrials_importer.py # ClinicalTrials.gov bulk importer (sponsor classification)
│   │   ├── retraction_watch_importer.py # Retraction Watch CSV importer (retraction detection)
│   │   └── README.md          # Importer documentation
│   ├── embeddings/            # Document embedding generation
│   │   ├── __init__.py        # Embeddings module exports
│   │   └── document_embedder.py # Document embedder (uses Ollama)
│   ├── exporters/             # Export functionality (PDF, HTML, etc.)
│   │   ├── __init__.py        # Exporters module exports
│   │   └── pdf_exporter.py    # Markdown to PDF exporter using ReportLab
│   ├── pdf_processor/         # PDF processing and segmentation for biomedical publications
│   │   ├── __init__.py        # PDF processor module exports
│   │   ├── models.py          # Data models (TextBlock, Section, Document, SectionType)
│   │   ├── extractor.py       # PDF text extraction with layout analysis (PyMuPDF)
│   │   ├── segmenter.py       # Section segmentation using NLP and heuristics
│   │   └── README.md          # PDF processor documentation
│   ├── discovery/             # Full-text PDF discovery system
│   │   ├── __init__.py        # Discovery module exports
│   │   ├── data_types.py      # Type-safe dataclasses (PDFSource, DiscoveryResult, etc.)
│   │   ├── resolvers.py       # Source resolvers (PMC, Unpaywall, DOI, OpenAthens)
│   │   └── full_text_finder.py # Discovery orchestrator
│   ├── benchmarking/          # Model benchmarking system
│   │   ├── __init__.py        # Benchmarking module exports
│   │   ├── data_types.py      # Type-safe dataclasses (BenchmarkRun, AlignmentMetrics, etc.)
│   │   ├── database.py        # Database operations (benchmarking schema)
│   │   └── runner.py          # BenchmarkRunner orchestration
│   └── cli/                   # Modular CLI architecture
│       ├── __init__.py        # CLI module exports
│       ├── config.py          # Configuration management
│       ├── ui.py              # User interface components
│       ├── query_processing.py # Query editing and search
│       ├── formatting.py      # Report formatting and export
│       ├── workflow.py        # Workflow orchestration
│       └── workflow_steps.py  # Enum-based workflow step definitions
│   └── gui/                   # Graphical user interfaces (PySide6/Qt)
│       ├── __init__.py        # GUI module exports
│       └── qt/                # Qt/PySide6-based GUI components
│           ├── __init__.py    # Qt module entry point
│           ├── core/          # Core application infrastructure
│           ├── plugins/       # Plugin system (research, fact_checker, etc.)
│           ├── widgets/       # Reusable Qt widgets
│           ├── resources/     # Resources and styling (dpi_scale, stylesheets)
│           └── qt_document_card_factory.py  # Qt document card factory
│   └── lab/                   # Experimental tools and interfaces
│       ├── __init__.py        # Lab module exports
│       ├── query_lab.py       # QueryAgent experimental GUI
│       ├── pico_lab.py        # PICOAgent experimental GUI for PICO component extraction
│       ├── study_assessment_lab.py # StudyAssessmentAgent experimental GUI for study quality evaluation
│       ├── prisma2020_lab.py  # PRISMA2020Agent experimental GUI for PRISMA 2020 compliance assessment
│       └── transparency_lab.py # TransparencyAgent experimental GUI for undisclosed bias risk assessment
│   └── factchecker/           # Fact-checker module (PostgreSQL-based)
│       ├── __init__.py        # Fact-checker module exports
│       ├── agent/             # Fact-checker agent
│       │   ├── __init__.py
│       │   └── fact_checker_agent.py  # FactCheckerAgent (orchestrates multi-agent workflow)
│       ├── db/                # Database operations
│       │   ├── __init__.py
│       │   └── database.py    # FactCheckerDB (PostgreSQL factcheck schema)
│       ├── cli/               # CLI application
│       │   ├── __init__.py
│       │   ├── app.py         # Main CLI entry point
│       │   ├── commands.py    # Command handlers
│       │   └── formatters.py  # Output formatting
│       └── gui/               # Review GUI application
│           ├── __init__.py
│           ├── review_app.py  # Main review application
│           ├── data_manager.py    # Database queries
│           ├── annotation_manager.py  # Annotation logic
│           ├── statement_display.py   # Statement UI
│           ├── citation_display.py    # Citation cards
│           └── dialogs.py     # Login/export dialogs
│   └── paperchecker/          # PaperChecker module (abstract fact-checking)
│       ├── __init__.py        # PaperChecker module exports
│       ├── data_models.py     # Type-safe dataclasses (Statement, Verdict, etc.)
│       ├── agent.py           # PaperCheckerAgent (main orchestrator)
│       ├── database.py        # PaperCheckDB (PostgreSQL papercheck schema)
│       ├── components/        # Sub-components
│       │   ├── statement_extractor.py   # Extract claims from abstracts
│       │   ├── counter_statement_generator.py  # Generate semantic negations
│       │   ├── hyde_generator.py        # HyDE abstract and keyword generation
│       │   ├── search_coordinator.py    # Multi-strategy search orchestration
│       │   └── verdict_analyzer.py      # Evidence analysis and verdict generation
│       └── cli/               # CLI application
│           ├── app.py         # Main CLI entry point
│           ├── commands.py    # Command handlers
│           └── formatters.py  # Output formatting
├── tests/                     # Comprehensive test suite
│   ├── test_query_agent.py    # Query processing tests
│   ├── test_scoring_agent.py  # Document scoring tests
│   ├── test_citation_agent.py # Citation extraction tests
│   ├── test_reporting_agent.py# Report generation tests
│   ├── test_counterfactual_agent.py # Counterfactual analysis tests
│   ├── test_transparency_agent.py # Transparency assessment tests (43 tests)
│   └── paperchecker/          # PaperChecker test suite
│       ├── test_statement_extractor.py   # Statement extraction tests
│       ├── test_counter_generator.py     # Counter-statement tests
│       ├── test_hyde_generator.py        # HyDE generation tests
│       ├── test_search_coordinator.py    # Search tests
│       ├── test_verdict_analyzer.py      # Verdict tests
│       └── test_end_to_end.py            # Full workflow tests
├── examples/                  # Demonstration scripts
│   ├── agent_demo.py          # Multi-agent workflow examples
│   ├── citation_demo.py       # Citation extraction demonstrations
│   ├── reporting_demo.py      # Report generation examples
│   └── counterfactual_demo.py # Counterfactual analysis demonstrations
├── doc/                       # Comprehensive documentation
│   ├── users/                 # End-user guides
│   │   ├── query_agent_guide.md
│   │   ├── citation_guide.md
│   │   ├── reporting_guide.md
│   │   ├── counterfactual_guide.md
│   │   ├── fact_checker_guide.md
│   │   ├── fact_checker_review_guide.md  # Fact-checker review GUI guide
│   │   ├── medrxiv_import_guide.md  # MedRxiv import guide
│   │   ├── document_embedding_guide.md  # Document embedding guide
│   │   ├── document_interrogation_guide.md  # Document interrogation tab guide
│   │   ├── pdf_import_guide.md  # PDF import and matching guide
│   │   ├── study_assessment_guide.md  # Study quality assessment guide
│   │   ├── prisma2020_guide.md  # PRISMA 2020 compliance assessment guide
│   │   ├── multi_model_query_guide.md  # Multi-model query generation guide
│   │   ├── paper_checker_guide.md  # PaperChecker overview and quick start
│   │   ├── paper_checker_cli_guide.md  # PaperChecker CLI reference
│   │   ├── paper_checker_lab_guide.md  # PaperChecker laboratory guide
│   │   ├── paper_reviewer_lab_guide.md  # Paper Reviewer laboratory guide
│   │   ├── full_text_discovery_guide.md  # Full-text PDF discovery guide
│   │   ├── transparency_assessment_guide.md  # Transparency assessment user guide
│   │   ├── clinicaltrials_import_guide.md  # ClinicalTrials.gov import guide
│   │   └── retraction_watch_guide.md  # Retraction Watch import guide
│   └── developers/            # Technical documentation
│       ├── agent_module.md
│       ├── citation_system.md
│       ├── reporting_system.md
│       ├── counterfactual_system.md
│       ├── fact_checker_system.md
│       ├── study_assessment_system.md  # Study quality assessment system
│       ├── prisma2020_system.md  # PRISMA 2020 compliance assessment system
│       ├── document_interrogation_ui_spec.md  # Document interrogation UI specification
│       ├── multi_model_architecture.md  # Multi-model architecture docs
│       ├── paper_checker_architecture.md  # PaperChecker system design and architecture
│       ├── full_text_discovery_system.md  # Full-text PDF discovery architecture
│       └── transparency_assessment_system.md  # Transparency assessment system architecture
├── scripts/                   # Utility scripts and laboratory tools
│   ├── query_lab.py           # QueryAgent experimental laboratory GUI
│   ├── pico_lab.py            # PICOAgent experimental laboratory GUI
│   ├── study_assessment_lab.py # StudyAssessmentAgent laboratory GUI
│   ├── prisma2020_lab.py      # PRISMA2020Agent laboratory GUI
│   ├── paper_weight_lab.py    # PaperWeightAssessmentAgent laboratory GUI
│   ├── paper_checker_lab.py   # PaperChecker laboratory GUI
│   ├── paper_reviewer_lab.py  # Paper Reviewer laboratory GUI (comprehensive assessment)
│   ├── transparency_lab.py    # TransparencyAgent laboratory GUI (bias risk assessment)
│   ├── export_review_package.py # Export SQLite review packages
│   ├── export_human_evaluations.py # Export human annotations to JSON
│   ├── import_human_evaluations.py # Re-import human evaluations
│   ├── chunk_worker.py        # Background worker for semantic chunk queue
│   ├── rechunk_semantic_chunks.py # Re-chunk documents CLI
│   └── pdf_verification_gui.py # PDF verification review GUI
├── data/                      # Temporary test data (gitignored)
├── bmlibrarian_cli.py         # Interactive CLI application with full multi-agent workflow
├── bmlibrarian_qt.py          # Qt-based main entry point
├── bmlibrarian_research_gui.py # Desktop research GUI application (modular entry point)
├── fact_checker_cli.py        # Fact-checker CLI for training data auditing
├── fact_checker_review_gui.py # Human review and annotation GUI for fact-checking results
├── fact_checker_stats.py      # Comprehensive statistical analysis for fact-checker evaluations
├── paper_checker_cli.py       # PaperChecker CLI for fact-checking medical abstracts against literature
├── model_benchmark_cli.py     # Model benchmarking CLI for evaluating document scoring models
├── medrxiv_import_cli.py      # MedRxiv preprint import CLI (multi-format extraction)
├── medrxiv_meca_cli.py        # MedRxiv MECA bulk sync CLI (AWS S3)
├── pubmed_import_cli.py       # PubMed E-utilities import CLI (targeted imports)
├── pubmed_bulk_cli.py         # PubMed FTP bulk download/import CLI (complete mirror)
├── pubmed_repair_cli.py       # PubMed download repair CLI (scan/fix corrupted gzip files)
├── pmc_bulk_cli.py            # PMC Open Access bulk download/import CLI
├── europe_pmc_pdf_cli.py      # Europe PMC PDF bulk download CLI
├── pdf_import_cli.py          # PDF import CLI with LLM-based metadata extraction and matching
├── transparency_analyzer_cli.py # Transparency assessment CLI for detecting undisclosed bias risk
├── clinicaltrials_import_cli.py # ClinicalTrials.gov bulk download/import CLI
├── retraction_watch_cli.py    # Retraction Watch CSV import CLI
├── migrate_config_to_db.py    # Settings migration CLI
├── export_to_pdf.py           # Markdown to PDF export CLI tool
├── initial_setup_and_download.py  # Database setup and battle-testing script
├── setup_wizard.py            # PySide6 setup wizard
├── baseline_schema.sql        # Base PostgreSQL schema definition
├── migrations/                # Database migration scripts
├── test_database.env.example  # Example environment file for testing
├── SETUP_GUIDE.md            # Comprehensive setup and testing guide
├── pyproject.toml             # Project configuration and dependencies
├── uv.lock                    # Locked dependency versions
├── .env                       # Environment configuration
└── README.md                  # Project description

Development Notes

Project Maturity

Current State: Full multi-agent architecture implemented with comprehensive testing and documentation
Core Features: Query processing, document scoring, citation extraction, and report generation are fully functional
Production Ready: Complete system with queue orchestration, error handling, and quality control

Development Principles

Modern Python Standards: Uses pyproject.toml, type hints, and Python >=3.12
Enum-Based Architecture: Flexible workflow orchestration with meaningful step names
Comprehensive Testing: Unit tests for all agents with >95% coverage
Documentation First: Both developer and user documentation for all features
AI-Powered: Local LLM integration via Ollama for privacy-preserving processing
Scalable Architecture: Queue-based processing for memory-efficient large-scale operations
Iterative Workflows: Support for repeatable steps and agent-driven refinement

Database Safety

CRITICAL: Never modify or drop the production database "knowledgebase"
Development: Use "bmlibrarian_dev" database for testing/migration experiments
Production Access: Read-only access unless explicitly instructed otherwise
Data Integrity: All document IDs are programmatically verified to prevent hallucination

Code Quality Standards

Testing: Write comprehensive unit tests for every new module
Documentation: Create both user guides (doc/users/) and developer docs (doc/developers/)
Type Safety: Use type hints throughout the codebase
Error Handling: Robust error recovery and logging
Model Standards: Only use approved models (gpt-oss:20b, medgemma4B_it_q8:latest)
Temporal Precision: Use specific years instead of vague temporal references (e.g., "In a 2023 study" NOT "In a recent study")

Agent Development Guidelines

BaseAgent Pattern: All agents inherit from BaseAgent with standardized interfaces
Configuration Integration: Agents must use get_model() and get_agent_config() from config system
Parameter Filtering: Filter agent config to only include supported parameters (temperature, top_p, etc.)
Queue Integration: New agents should support queue-based processing
Workflow Integration: Agents should work with enum-based workflow system
Connection Testing: All agents must implement connection testing methods
Progress Tracking: Support progress callbacks for long-running operations
Document ID Integrity: Always use real database IDs, never mock/fabricated references
Step Handler Methods: Implement appropriate workflow step handlers for agent actions
No Artificial Limits: Process ALL documents unless explicitly configured otherwise

Workflow Development Guidelines

WorkflowStep Enum: Use meaningful names for new workflow steps
Repeatable Steps: Mark steps as repeatable when they support iteration
Branching Logic: Implement conditional execution and error recovery
Context Management: Preserve state across step executions
Auto Mode Support: Ensure steps work in non-interactive mode

Usage Examples

Research GUI Application

BMLibrarian includes a comprehensive desktop research application built with PySide6/Qt:

# Start the research GUI application
uv run python bmlibrarian_research_gui.py

# Research GUI Features:
# - Multi-line text input for medical research questions
# - Interactive/automated workflow toggle
# - Visual workflow progress with collapsible step cards
# - Real-time agent execution with proper model configuration
# - Formatted markdown report preview with scrolling
# - Full integration with BMLibrarian's multi-agent system

# Command line options:
uv run python bmlibrarian_research_gui.py --auto "research question"  # Automated execution
uv run python bmlibrarian_research_gui.py --quick                    # Quick mode with limits
uv run python bmlibrarian_research_gui.py --max-results 100          # Custom search limits
uv run python bmlibrarian_research_gui.py --score-threshold 3.0      # Custom relevance threshold

The Research GUI provides:

Desktop Application: Native cross-platform desktop interface using PySide6
Visual Workflow: Collapsible cards showing real-time progress through 11 workflow steps
Agent Integration: Uses configured models from ~/.bmlibrarian/config.json
Document Processing: Scores ALL found documents by default (no artificial limits)
Citation Extraction: Processes ALL documents above relevance threshold
Report Generation: Full markdown rendering with GitHub-style formatting
Configuration Support: Respects agent models, parameters, and thresholds from config
Performance Modes: Normal (all documents) vs Quick (limited for speed)

Configuration GUI Application

BMLibrarian includes a modern graphical configuration interface built with PySide6/Qt:

# Start the desktop configuration GUI
uv run python bmlibrarian_config_gui.py

# GUI Features:
# - Native desktop application with tabbed interface
# - Document Interrogation tab: Interactive document viewer with AI chatbot for Q&A
# - Separate configuration tabs for each agent
# - Model selection with live refresh from Ollama server
# - Parameter adjustment with sliders and input fields
# - Configuration save/load functionality
# - Connection testing to verify Ollama availability
# - Reset to defaults option
# - Cross-platform compatibility

# Command line options:
uv run python bmlibrarian_config_gui.py --debug            # Enable debug mode

The GUI provides:

Native Desktop App: Cross-platform desktop application using PySide6
Document Interrogation: Interactive split-pane interface for document Q&A
- Left pane: Document viewer (PDF, Markdown, text files) with 60% width
- Right pane: Chat interface with dialogue bubbles (40% width)
- File selector and model dropdown in top bar
- Support for programmatic document loading from other plugins
- Message history with user/AI distinction
- Real-time chat with selected Ollama model
Agent Configuration: Individual tabs for Query, Scoring, Citation, Reporting, Counterfactual, and Editor agents
Model Management: Dropdown selection with live model refresh from Ollama
Parameter Tuning: Interactive sliders for temperature, top-p, and agent-specific settings
General Settings: Ollama server configuration, database settings, and CLI defaults
File Operations: Save/load configuration files with JSON format
Connection Testing: Verify Ollama server connectivity and list available models

Fact-Checker Review GUI Application

BMLibrarian includes a human review and annotation interface for fact-checking results:

# Start the Fact-Checker Review GUI
uv run python fact_checker_review_gui.py

# Load JSON file (auto-creates SQLite database for annotations)
uv run python fact_checker_review_gui.py --input-file results.json

# Load existing database directly
uv run python fact_checker_review_gui.py --input-file results.db

# Incremental mode: only show statements without AI evaluations
uv run python fact_checker_review_gui.py --input-file results.json --incremental

The Fact-Checker Review GUI provides:

Database Auto-Creation: Automatically creates SQLite database from JSON files (e.g., results.json → results.db)
Intelligent Merging: If database exists, imports new statements from JSON without overwriting existing annotations
CLI-Consistent Behavior: Same database workflow as the fact-checker CLI for seamless integration
Real-Time Persistence: All annotations saved directly to database as you review
Incremental Mode: Filter to show only unevaluated statements (consistent with CLI)
Multi-User Support: Track annotations by different reviewers with annotator metadata
Evidence Review: Examine supporting citations with expandable cards showing full abstracts
Annotation Comparison: View original, AI, and human annotations side-by-side

Database Workflow (matches CLI):

Load results.json: Checks if results.db exists
If DB exists: Merges new statements from JSON (skips existing with evaluations/annotations)
If DB doesn't exist: Creates new database and imports all JSON data
All annotations are saved to the database in real-time

This ensures that the GUI and CLI provide identical database management behavior, making it easy to switch between interfaces or use both for different tasks.

Fact-Checker Distribution System for Inter-Rater Reliability

BMLibrarian includes a complete distribution system for sending fact-check results to external reviewers without requiring PostgreSQL installation. This enables inter-rater reliability analysis with multiple independent human annotators.

Complete Workflow:

Export Review Package (PostgreSQL → SQLite):
```
uv run python export_review_package.py --output review_package.db --exported-by username
```
- Creates self-contained SQLite database with:
  - All statements and AI evaluations
  - Evidence citations with full document abstracts
  - Document metadata (titles, PMIDs, DOIs)
  - NO human annotations from other reviewers
- Typical size: 100-500 MB for 1000 statements
- Ready for distribution via file sharing
Distribute to External Reviewers:
- Send .db file + fact_checker_review_gui.py to reviewers
- No PostgreSQL installation required
- Works offline with full functionality
Reviewer Annotation (SQLite):
```
uv run python fact_checker_review_gui.py --user alice --db-file review_package.db
```
- Read-write mode: Annotations saved to SQLite in real-time
- Full abstract display for all citations
- Same interface as PostgreSQL version
- Supports blind mode and incremental mode
Export Human Evaluations (SQLite → JSON):
```
uv run python export_human_evaluations.py --db-file review_package.db --annotator alice -o alice.json
```
- Lightweight JSON export (1-10 KB per statement)
- Contains: statement_id, statement_text, annotation, explanation
- Reviewer sends back only the small JSON file
Re-import to PostgreSQL (JSON → PostgreSQL):
```
uv run python import_human_evaluations.py alice.json bob.json charlie.json
```
- Creates/updates annotator records with username tagging
- Validates statements match by ID and text
- Inserts/updates annotations (one per annotator per statement)
- Reports statistics (inserted, updated, errors)
- Update/overwrite behavior for duplicate annotations

Analyze Inter-Rater Agreement:

-- PostgreSQL query
SELECT * FROM factcheck.calculate_inter_annotator_agreement();
SELECT * FROM factcheck.v_inter_annotator_agreement;

Key Features:

Database Abstraction: Unified interface supporting both PostgreSQL and SQLite backends
Self-Contained Packages: All data needed for review in single .db file
No Dependencies: Reviewers don't need PostgreSQL, just Python + PySide6
Validation: Statement text matching prevents mismatches during import
Multi-Reviewer Support: Track annotations by username for inter-rater analysis
Security: Audit trail via export_history, encrypted distribution recommended

Documentation:

Quick Start Guide: doc/users/FACT_CHECKER_DISTRIBUTION_QUICKSTART.md
Implementation Plan: doc/developers/FACT_CHECKER_DISTRIBUTION_PLAN.md
User Guide: doc/users/fact_checker_distribution_guide.md (if exists)

Architecture:

src/bmlibrarian/factchecker/db/abstract_db.py: Abstract database interface
src/bmlibrarian/factchecker/db/sqlite_db.py: SQLite implementation
src/bmlibrarian/factchecker/db/postgresql_db.py: PostgreSQL wrapper
src/bmlibrarian/factchecker/db/sqlite_schema.sql: Complete SQLite schema
export_review_package.py: Review package export script
export_human_evaluations.py: Human annotations export script
import_human_evaluations.py: PostgreSQL import script

Fact-Checker Statistical Analysis

BMLibrarian includes a comprehensive statistical analysis tool (fact_checker_stats.py) for evaluating fact-checker performance and inter-rater reliability. The tool calculates multiple metrics with proper statistical rigor.

Statistical Metrics Calculated:

Concordance rates: Agreement between AI evaluations and expected answers or human annotations with 95% confidence intervals using Wilson score interval (binomial proportions)
Cohen's kappa: Inter-rater reliability coefficient with standard errors and 95% confidence intervals
Confusion matrices: Cross-tabulation of evaluations with accuracy, precision, recall, and F1-scores
Confidence calibration: Relationship between AI confidence levels (low/medium/high) and actual accuracy
Chi-square tests: Statistical significance testing for categorical data (p < 0.05)
Category-specific transitions: Analysis of evaluation changes:
- Yes → No transitions: Percentage of statements where evaluations changed from "yes" to "no"
- No → Yes transitions: Percentage of statements where evaluations changed from "no" to "yes"
- Certainty changes: Percentage moving to "maybe" (increased uncertainty)
- Stability: Percentage with unchanged evaluations

Usage:

# Console output only
uv run python fact_checker_stats.py

# Export to CSV files
uv run python fact_checker_stats.py --export-csv stats_output/

# Create visualization plots (confusion matrices, calibration curves, transition charts)
uv run python fact_checker_stats.py --export-csv stats_output/ --plot

Output Files:

ai_vs_expected.csv: Raw data for AI evaluations vs expected answers
ai_vs_human.csv: Raw data for AI evaluations vs human annotations
human_pairs.csv: Paired human annotations for inter-rater analysis
summary_statistics.json: Complete statistical results in JSON format
confusion_matrix_ai_vs_expected.png: Heatmap visualization
confidence_calibration.png: Calibration curve with error bars
transition_analysis.png: Bar charts showing category transitions

Key Features:

Rigorous Statistics: Uses Wilson score intervals for binomial proportions, Fleiss standard errors for kappa
Three Comparisons: AI vs Expected, AI vs Human, Human vs Human inter-rater agreement
Significance Testing: Chi-square tests for independence with p-value interpretation
Confidence Assessment: Evaluates whether AI confidence levels correlate with actual accuracy
Transition Analysis: Identifies patterns in evaluation changes for temporal validity studies
Publication-Ready: Generates formatted reports and high-resolution plots (300 DPI)

Statistical Methods:

Wilson score interval for concordance rate confidence intervals (better coverage than normal approximation)
Fleiss et al. (1969) formula for Cohen's kappa standard errors
Pearson chi-square test for categorical independence
Landis & Koch (1977) interpretation scale for kappa values

Documentation:

Complete guide: doc/users/FACT_CHECKER_STATS_GUIDE.md
Statistical methods and interpretation guidelines included
Example output with real-world interpretation

Enum-Based Workflow System

from bmlibrarian.agents import (
    QueryAgent, DocumentScoringAgent, CitationFinderAgent, 
    ReportingAgent, CounterfactualAgent, EditorAgent, AgentOrchestrator
)
from bmlibrarian.cli.workflow_steps import (
    WorkflowStep, WorkflowDefinition, WorkflowExecutor, 
    create_default_research_workflow, StepResult
)

# Initialize workflow system
workflow_definition = create_default_research_workflow()
workflow_executor = WorkflowExecutor(workflow_definition)

# Initialize orchestrator and agents
orchestrator = AgentOrchestrator(max_workers=4)
query_agent = QueryAgent(orchestrator=orchestrator)
scoring_agent = DocumentScoringAgent(orchestrator=orchestrator)
citation_agent = CitationFinderAgent(orchestrator=orchestrator)
reporting_agent = ReportingAgent(orchestrator=orchestrator)
counterfactual_agent = CounterfactualAgent(orchestrator=orchestrator)
editor_agent = EditorAgent(orchestrator=orchestrator)

# Set up workflow context
user_question = "What are the cardiovascular benefits of exercise?"
workflow_executor.add_context('research_question', user_question)

# Execute workflow steps
current_step = workflow_definition.steps[0]
while current_step:
    execution = workflow_executor.execute_step(current_step, step_handler)
    workflow_executor.execution_history.append(execution)
    
    if execution.result == StepResult.SUCCESS:
        current_step = workflow_definition.get_next_step(current_step, workflow_executor.context)
    elif execution.result == StepResult.BRANCH:
        current_step = workflow_executor.get_context('branch_to_step')
    else:
        break

# Get final results from context
final_report = workflow_executor.get_context('comprehensive_report')
counterfactual_analysis = workflow_executor.get_context('counterfactual_analysis')

Basic Multi-Agent Workflow (Legacy)

# For direct agent usage without workflow orchestration
from bmlibrarian.agents import (
    QueryAgent, DocumentScoringAgent, CitationFinderAgent, 
    ReportingAgent, CounterfactualAgent, AgentOrchestrator
)

# Initialize orchestrator and agents
orchestrator = AgentOrchestrator(max_workers=4)
query_agent = QueryAgent(orchestrator=orchestrator)
scoring_agent = DocumentScoringAgent(orchestrator=orchestrator)
citation_agent = CitationFinderAgent(orchestrator=orchestrator)
reporting_agent = ReportingAgent(orchestrator=orchestrator)
counterfactual_agent = CounterfactualAgent(orchestrator=orchestrator)

# Manual workflow execution
user_question = "What are the cardiovascular benefits of exercise?"
documents = query_agent.search_documents(user_question)
scored_docs = [(doc, scoring_agent.evaluate_document(user_question, doc)) 
               for doc in documents if scoring_agent.evaluate_document(user_question, doc)]
citations = citation_agent.process_scored_documents_for_citations(
    user_question=user_question, scored_documents=scored_docs, score_threshold=2.5)
report = reporting_agent.generate_citation_based_report(
    user_question=user_question, citations=citations, format_output=True)

Key Features Demonstrated

Enum-Based Workflow: Flexible step orchestration with meaningful names
Iterative Processing: Repeatable steps for query refinement and evidence enhancement
Natural Language Processing: Convert questions to database queries
Relevance Assessment: AI-powered document scoring (1-5 scale)
Citation Extraction: Extract specific passages that answer questions
Evidence Synthesis: Generate professional medical reports with proper references
Counterfactual Analysis: Generate research questions to find contradictory evidence
Comprehensive Editing: Balanced report integration with all evidence types
Quality Control: Document verification and evidence strength assessment
Confidence Assessment: Evaluate evidence reliability with contradictory evidence search
Agent-Driven Refinement: Agents can request more citations during report generation
Auto Mode Support: Non-interactive execution with graceful error handling
Scalable Processing: Queue-based batch processing for large datasets

Important Instructions and Reminders

When developing new agents or features:

Always inherit from BaseAgent for consistent interfaces
Use configuration system: Load models via get_model() and settings via get_agent_config()
Filter configuration parameters: Only pass supported parameters to agent constructors
Process ALL documents by default: No artificial limits unless explicitly configured
Implement comprehensive testing with realistic test data
Create both user and developer documentation for all new features
Never create or modify production database without explicit permission
Ensure document ID verification to prevent citation hallucination
Support queue-based processing for scalability
Include progress tracking for long-running operations
Use enum-based workflow system for new workflow steps (workflow_steps.py)
Use modular GUI architecture for new GUI features (see src/bmlibrarian/gui/)
Include counterfactual analysis capabilities where appropriate for evidence validation
Implement workflow step handlers for agent integration with orchestration system
Support auto mode execution with graceful fallbacks for interactive features

Testing and Quality Assurance:

Run full test suite: uv run python -m pytest tests/
Test CLI: uv run python bmlibrarian_cli.py --quick
Test Research GUI: uv run python bmlibrarian_research_gui.py --auto "test question" --quick
Test Configuration GUI: uv run python bmlibrarian_config_gui.py
Test agent demos: uv run python examples/agent_demo.py
Test counterfactual analysis: uv run python examples/counterfactual_demo.py
Verify Ollama connection before LLM operations
Validate all citations reference real database documents
Check evidence strength assessments are appropriate
Verify counterfactual analysis generates meaningful research questions
Ensure agents use configured models from config.json
Test document processing without artificial limits

The "golden rules" of programming for BMLibrarian

Never trust input from users, external data, network or file data: Always validate and sanitize input. Never trust that it will be in the expected format or contain the expected data.
No magic numbers: Always use constants or configuration for numbers. Never hardcode numbers. Always use named constants for numbers that are used in multiple places.
No hardcoded paths: Always use constants or configuration for paths. Never hardcode paths. Always use named constants for paths that are used in multiple places.
All model communication happens through the python ollama library: Never use raw HTTP requests to communicate with Ollama. Always use the ollama library.
All postgres database communication happens through the database manager: Never use psycopg connection directly or modify the database structure/schema without proper migration.
All parameters must have type hints: No exceptions.
All functions, methods, and classes must have docstrings: No exceptions.
All errors must be handled, logged, and reported to the user: No exceptions.
No inline style sheets: All stylesheets must be generated by the stylesheet generator / centralised styling system (stylesheet_generator.py).
No hardcoded pixel values: All dimensions must be calculated from font metrics or relative to other elements, generally using our dpi font scaling system (dpi_scale.py).
We prefer reusable pure functions over more complex larger structures. Where possible, such pure functions should be factored out into generally useful libraries.
All modules need to be documented in markdown format in doc/users for the end user, and doc/developers for developers. Important information for the AI assistant goes into doc/llm.
All database migrations MUST be idempotent: Use CREATE TABLE IF NOT EXISTS, CREATE INDEX IF NOT EXISTS, DO $$ ... IF NOT EXISTS ... END $$ blocks for ALTER TABLE. Migrations must be safe to run multiple times without errors or data loss.
Migration files must NOT contain their own tracking code: The MigrationManager handles migration tracking via the bmlibrarian_migrations table. Never create or insert into public.schema_migrations or similar tables inside migration files.
NEVER modify production database without explicit permission: Use the development database for testing migrations. Production changes require team approval.
Use get_document_details() for fetching document metadata: When loading documents in widgets/components, always use the canonical get_document_details(document_id) function from bmlibrarian.database. This ensures consistent field names, pre-formatted authors, and proper PMID extraction across all UI components. Never write inline SQL for document fetching in UI code.

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Dependencies and Environment

Configuration

Database-Backed User Settings

Development Commands

OpenAthens Authentication

Key Features

Security Improvements Implemented

Usage

Documentation

PDF Discovery and Download

Workflow

Key Features

Usage

Configuration

Documentation

PDF Export System

Key Features

Quick Start

Command-Line Tool

Configuration Options

Dependencies

Documentation

Architecture

Core Components

Agent Types

Document Card Factory System

Multi-Model Query Generation

Workflow Orchestration System

Workflow Steps

Iterative Capabilities

Queue System

Project Structure

Development Notes

Project Maturity

Development Principles

Database Safety

Code Quality Standards

Agent Development Guidelines

Workflow Development Guidelines

Usage Examples

Research GUI Application

Configuration GUI Application

Fact-Checker Review GUI Application

Fact-Checker Distribution System for Inter-Rater Reliability

Fact-Checker Statistical Analysis

Enum-Based Workflow System

Basic Multi-Agent Workflow (Legacy)

Key Features Demonstrated

Important Instructions and Reminders

When developing new agents or features:

Testing and Quality Assurance:

The "golden rules" of programming for BMLibrarian