Skip to content

Latest commit

 

History

History
1209 lines (1066 loc) · 74 KB

File metadata and controls

1209 lines (1066 loc) · 74 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

BMLibrarian is a comprehensive Python library providing AI-powered access to biomedical literature databases. It features a multi-agent architecture with specialized agents for query processing, document scoring, citation extraction, report generation, and counterfactual analysis, all coordinated through an advanced task queue orchestration system.

The project includes a modern modular CLI (bmlibrarian_cli.py) that provides full multi-agent workflow capabilities with enhanced maintainability and extensibility.

Dependencies and Environment

  • Python: Requires Python >=3.12
  • Database: PostgreSQL with pgvector extension for semantic search
  • AI/LLM: Ollama for local language model inference
  • Main dependencies:
    • psycopg >=3.2.9 for PostgreSQL connectivity (via DatabaseManager)
    • ollama - Python library for Ollama LLM communication (never use raw HTTP requests)
    • PySide6 for GUI applications
  • Package manager: Uses uv for dependency management (uv.lock present)

Configuration

  • Configuration file locations (OS agnostic):
    • Primary: ~/.bmlibrarian/config.json (recommended)
    • Legacy fallback: bmlibrarian_config.json in current directory
    • GUI default: Always saves to ~/.bmlibrarian/config.json
  • Environment variables are configured in .env file
  • Database connection parameters:
    • POSTGRES_DB: Database name (default: "knowledgebase")
    • POSTGRES_USER: Database user
    • POSTGRES_PASSWORD: Database password
    • POSTGRES_HOST: Database host (default: "localhost")
    • POSTGRES_PORT: Database port (default: "5432")
  • File system:
    • PDF_BASE_DIR: Base directory for PDF files (default: "~/knowledgebase/pdf")
  • AI/LLM configuration:
    • Ollama service typically runs on http://localhost:11434
    • Models used: gpt-oss:20b (default for complex tasks), medgemma4B_it_q8:latest (fast processing)
  • OpenAthens proxy authentication (optional):
    • Enable institutional access to paywalled PDFs via OpenAthens proxy
    • Supports 2FA authentication with persistent sessions (24 hours default)
    • Configure in config.json under "openathens" section
    • Requires Playwright: uv add playwright && uv run python -m playwright install chromium
    • See: doc/OPENATHENS_QUICKSTART.md and doc/users/openathens_guide.md

Database-Backed User Settings

BMLibrarian supports database-backed settings for authenticated users:

  • User Authentication: Login system with username/password stored in public.users table
  • Per-User Settings: Each user has personalized settings in bmlsettings.user_settings
  • Default Settings: System-wide defaults in bmlsettings.default_settings
  • Session Management: Session tokens for persistent authentication

Resolution Priority Order:

  1. User's database settings (when authenticated)
  2. Default database settings (if DB connected)
  3. JSON file settings (~/.bmlibrarian/config.json)
  4. Hardcoded DEFAULT_CONFIG

Valid Settings Categories: models, ollama, agents, database, search, query_generation, gui, openathens, pdf, general

Key Components:

  • UserService: User registration, authentication, session management
  • UserSettingsManager: Per-user settings CRUD operations
  • BMLibrarianConfig.set_user_context(): Enables database-backed settings
  • migrate_config_to_db.py: CLI tool for migrating JSON configs to database

Documentation:

  • User guide: doc/users/settings_migration_guide.md
  • Architecture: doc/developers/db_settings_architecture.md
  • Planning: doc/planning/db_settings_refactor_plan.md

Development Commands

Since this project uses uv for package management:

  • uv sync - Install/sync dependencies
  • uv run python -m [module] - Run Python modules in the virtual environment
  • Testing: uv run python -m pytest tests/ - Run comprehensive test suite
  • Database Setup & Battle Testing:
    • uv run python initial_setup_and_download.py test_database.env - Complete database setup and import testing
    • uv run python initial_setup_and_download.py test.env --skip-medrxiv --skip-pubmed - Schema setup only
    • uv run python initial_setup_and_download.py test.env --medrxiv-days 1 --pubmed-max-results 10 - Quick validation test
    • See SETUP_GUIDE.md for comprehensive documentation
  • CLI Applications:
    • uv run python bmlibrarian_cli.py - Interactive medical research CLI with full multi-agent workflow
    • uv run python fact_checker_cli.py statements.json - Batch fact-checker for biomedical statements (stores in PostgreSQL factcheck schema)
    • uv run python fact_checker_cli.py statements.json --incremental - Incremental mode (resume processing, skip already-evaluated statements)
    • uv run python medrxiv_import_cli.py update --download-pdfs - Import medRxiv preprints with multi-format full-text extraction
    • uv run python medrxiv_import_cli.py update --extraction-strategy auto - Import with priority-based extraction (text → HTML → XML → PDF)
    • uv run python medrxiv_import_cli.py extract-text --missing-only --limit 100 - Re-extract full text for papers without it
    • uv run python medrxiv_import_cli.py fetch-pdfs --limit 100 - Download missing PDFs for existing records
    • uv run python medrxiv_import_cli.py status - Show medRxiv import statistics with extraction strategy info
    • uv run python medrxiv_meca_cli.py list --limit 10 - List available MECA packages from AWS S3 (requires boto3)
    • uv run python medrxiv_meca_cli.py download --limit 10 --output-dir ~/medrxiv_meca - Download MECA packages
    • uv run python medrxiv_meca_cli.py sync --output-dir ~/medrxiv_meca --limit 100 - Full MECA workflow (download + import)
    • uv run python pubmed_import_cli.py search "COVID-19 vaccine" --max-results 100 - Import PubMed articles by search query (targeted import)
    • uv run python pubmed_import_cli.py pmids 12345678 23456789 - Import PubMed articles by PMID list
    • uv run python pubmed_import_cli.py status - Show PubMed import statistics
    • uv run python pubmed_bulk_cli.py download-baseline - Download complete PubMed baseline (~38M articles, ~400GB, for offline mirroring)
    • uv run python pubmed_bulk_cli.py download-updates - Download PubMed daily update files (new articles + metadata updates)
    • uv run python pubmed_bulk_cli.py import --type baseline - Import downloaded baseline files into database (with Markdown abstract formatting)
    • uv run python pubmed_bulk_cli.py sync --updates-only - Download and import PubMed updates (incremental sync)
    • uv run python pubmed_bulk_cli.py status - Show PubMed bulk download/import status
    • uv run python pubmed_repair_cli.py scan - Scan downloaded PubMed files for gzip corruption
    • uv run python pubmed_repair_cli.py scan --type update - Scan only update files for corruption
    • uv run python pubmed_repair_cli.py repair --reimport -y - Re-download corrupted files and re-import to database
    • Note: PubMed bulk importer now preserves abstract structure and formatting as Markdown (section labels, subscripts, superscripts, emphasis)
    • uv run python pmc_bulk_cli.py list --license oa_comm - List available PMC Open Access packages
    • uv run python pmc_bulk_cli.py download --license oa_comm - Download PMC baseline packages (with PDF + full-text NXML)
    • uv run python pmc_bulk_cli.py download --delay 300 - Download with 5-minute delay (polite mode)
    • uv run python pmc_bulk_cli.py download --range PMC001xxxxxx - Download specific PMCID range only
    • uv run python pmc_bulk_cli.py extract - Extract downloaded tar.gz packages
    • uv run python pmc_bulk_cli.py import - Import extracted articles to database
    • uv run python pmc_bulk_cli.py sync --license oa_comm - Full workflow: download + extract + import
    • uv run python pmc_bulk_cli.py status - Show PMC bulk download progress
    • uv run python pmc_bulk_cli.py estimate - Estimate download time and storage requirements
    • Note: PMC bulk importer is designed for offline work with configurable rate limiting (default 2 min between files)
    • uv run python europe_pmc_bulk_cli.py list - List available Europe PMC Open Access packages (~1000+ files, ~100 articles each)
    • uv run python europe_pmc_bulk_cli.py download --output-dir ~/europepmc - Download Europe PMC full-text XML with resumable progress
    • uv run python europe_pmc_bulk_cli.py download --delay 120 --limit 10 - Download with 2-minute delay, limited to 10 packages
    • uv run python europe_pmc_bulk_cli.py download --range 1-1000000 - Download specific PMCID range only
    • uv run python europe_pmc_bulk_cli.py verify --output-dir ~/europepmc - Verify gzip integrity of all downloaded files
    • uv run python europe_pmc_bulk_cli.py status - Show Europe PMC download progress
    • uv run python europe_pmc_bulk_cli.py estimate - Estimate remaining download time
    • uv run python europe_pmc_bulk_cli.py import --output-dir ~/europepmc - Import downloaded packages to database with Markdown full-text
    • uv run python europe_pmc_bulk_cli.py import --limit 5 --batch-size 50 - Import with limits
    • uv run python europe_pmc_bulk_cli.py import-status - Show import progress and statistics
    • uv run python europe_pmc_bulk_cli.py verify-import --package PMC13900_PMC17829.xml.gz - Verify package can be parsed before import
    • Note: Europe PMC importer converts JATS XML to Markdown with proper headers, figure placeholders, and emphasis formatting
    • uv run python europe_pmc_pdf_cli.py list - List available Europe PMC PDF packages
    • uv run python europe_pmc_pdf_cli.py download --output-dir ~/europepmc_pdf - Download Europe PMC PDFs with resumable progress
    • uv run python europe_pmc_pdf_cli.py download --delay 120 --limit 10 - Download with 2-minute delay, limited to 10 packages
    • uv run python europe_pmc_pdf_cli.py download --range 1-1000000 - Download specific PMCID range only
    • uv run python europe_pmc_pdf_cli.py download --no-extract - Download packages without extracting PDFs
    • uv run python europe_pmc_pdf_cli.py verify --output-dir ~/europepmc_pdf - Verify integrity of downloaded packages
    • uv run python europe_pmc_pdf_cli.py extract --output-dir ~/europepmc_pdf - Extract PDFs from downloaded packages
    • uv run python europe_pmc_pdf_cli.py status - Show Europe PMC PDF download progress
    • uv run python europe_pmc_pdf_cli.py estimate - Estimate remaining download time
    • uv run python europe_pmc_pdf_cli.py find --pmcid PMC123456 - Find a specific PDF by PMCID in local storage
    • Note: Europe PMC PDF downloader extracts PDFs to year-based subdirectories for organization
    • uv run python embed_documents_cli.py embed --source medrxiv --limit 100 - Generate embeddings for medRxiv abstracts
    • uv run python embed_documents_cli.py count --source medrxiv - Count documents needing embeddings
    • uv run python embed_documents_cli.py status - Show embedding statistics
    • uv run python mesh_import_cli.py import --year 2025 - Download and import MeSH vocabulary (~400MB with SCRs)
    • uv run python mesh_import_cli.py import --year 2025 --no-supplementary - Import MeSH without SCRs (~180MB, faster)
    • uv run python mesh_import_cli.py status - Show MeSH database statistics and local DB availability
    • uv run python mesh_import_cli.py lookup "heart attack" - Look up a MeSH term (uses local DB if available)
    • uv run python mesh_import_cli.py search "cardio" - Search MeSH by partial match
    • uv run python mesh_import_cli.py expand "MI" - Expand term to all synonyms/entry terms
    • uv run python mesh_import_cli.py history - Show MeSH import history
    • uv run python pdf_import_cli.py file /path/to/paper.pdf - Import single PDF with LLM-based metadata extraction and database matching
    • uv run python pdf_import_cli.py directory /path/to/pdfs/ - Import directory of PDFs with intelligent matching
    • uv run python pdf_import_cli.py directory /pdfs/ --recursive - Import PDFs recursively from subdirectories
    • uv run python pdf_import_cli.py file paper.pdf --dry-run - Preview import without making changes
    • uv run python pdf_import_cli.py status - Show PDF import statistics and coverage
    • uv run python fact_checker_cli.py statements.json -o results.json - Export results to JSON file (PostgreSQL is always used)
    • uv run python fact_checker_stats.py - Generate comprehensive statistical analysis report (console output)
    • uv run python fact_checker_stats.py --export-csv stats_output/ - Export statistics to CSV files
    • uv run python fact_checker_stats.py --export-csv stats_output/ --plot - Create visualization plots
    • uv run python paper_checker_cli.py abstracts.json - PaperChecker: fact-check medical abstracts against literature
    • uv run python paper_checker_cli.py abstracts.json -o results.json - Export PaperChecker results to JSON
    • uv run python paper_checker_cli.py abstracts.json --export-markdown reports/ - Export markdown reports per abstract
    • uv run python paper_checker_cli.py --pmid 12345678 23456789 - Check abstracts by PMID from database
    • uv run python paper_checker_cli.py abstracts.json --quick - Quick test mode (max 5 abstracts)
    • uv run python paper_checker_cli.py abstracts.json --continue-on-error - Continue processing on failures
    • uv run python model_benchmark_cli.py benchmark "research question" --models gpt-oss:20b medgemma4B_it_q8:latest - Benchmark document scoring models
    • uv run python model_benchmark_cli.py benchmark "question" --models gpt-oss:20b --authoritative gpt-oss:120B -o results.json - Benchmark with custom authoritative model and export
    • uv run python model_benchmark_cli.py history - View benchmark run history
    • uv run python model_benchmark_cli.py show --run-id 5 - View specific benchmark run details
    • uv run python model_benchmark_cli.py compare --run-id 5 - Compare score distributions between models
    • uv run python migrate_config_to_db.py --interactive - Interactive settings migration wizard
    • uv run python migrate_config_to_db.py --user alice --config ~/.bmlibrarian/config.json - Migrate JSON config to user database settings
    • uv run python migrate_config_to_db.py --defaults --config config.json - Set default settings (admin)
    • uv run python migrate_config_to_db.py --export --user alice -o backup.json - Export user settings to JSON
    • uv run python export_to_pdf.py report.md -o report.pdf - Export markdown to PDF with default settings
    • uv run python export_to_pdf.py report.md -o report.pdf --title "Research" --author "Dr. Smith" - Export with custom metadata
    • uv run python export_to_pdf.py report.md -o report.pdf --research-report --citation-count 45 - Export as BMLibrarian research report
    • uv run python export_to_pdf.py report.md -o report.pdf --letter --font-size 12 - Export with US Letter format and custom font size
  • GUI Applications:
    • uv run python setup_wizard.py - PySide6 setup wizard for initial database configuration and data import
    • uv run python bmlibrarian_research_gui.py - Desktop research application with visual workflow progress and report preview
    • uv run python bmlibrarian_config_gui.py - Graphical configuration interface for agents and settings
    • uv run python fact_checker_review_gui.py - Human review and annotation interface for fact-checking results (PostgreSQL-based)
    • uv run python fact_checker_review_gui.py --user alice - Launch review GUI with username (skip login dialog)
    • uv run python fact_checker_review_gui.py --user alice --incremental - Incremental mode (only show unannotated statements)
    • uv run python fact_checker_review_gui.py --user bob --blind - Blind mode (hide AI/original annotations for unbiased review)
    • uv run python fact_checker_review_gui.py --user alice --db-file review_package.db - Review with SQLite package (no PostgreSQL needed)
    • uv run python audit_validation_gui.py - Audit trail validation GUI for human review of automated evaluations
    • uv run python audit_validation_gui.py --user alice - Launch with specified reviewer name
    • uv run python audit_validation_gui.py --user alice --incremental - Show only unvalidated items
    • uv run python systematic_review_gui.py - Checkpoint-based systematic review GUI with resume capability
    • uv run python systematic_review_gui.py --review-dir ~/my_reviews - Start with specific review directory
    • uv run python systematic_review_gui.py --debug - Enable debug logging
  • BMLibrarian Lite (lightweight version without PostgreSQL):
  • Fact-Checker Distribution Tools (for inter-rater reliability analysis):
    • uv run python scripts/export_review_package.py --output review_package.db --exported-by username - Export self-contained SQLite review package
    • uv run python scripts/export_human_evaluations.py --db-file review.db --annotator alice -o alice.json - Export human annotations to JSON
    • uv run python scripts/import_human_evaluations.py alice.json bob.json charlie.json - Re-import human evaluations to PostgreSQL
  • Laboratory Tools:
    • uv run python scripts/query_lab.py - Interactive QueryAgent laboratory for experimenting with natural language to PostgreSQL query conversion
    • uv run python scripts/pico_lab.py - Interactive PICO laboratory for extracting Population, Intervention, Comparison, and Outcome components from documents
    • uv run python scripts/study_assessment_lab.py - Interactive Study Assessment laboratory for evaluating research quality and trustworthiness
    • uv run python scripts/prisma2020_lab.py - Interactive PRISMA 2020 laboratory for assessing systematic review compliance with PRISMA reporting guidelines
    • uv run python scripts/paper_weight_lab.py - Interactive Paper Weight Assessment laboratory (PySide6/Qt) for evaluating evidential weight of research papers
    • uv run python scripts/paper_checker_lab.py - Interactive PaperChecker laboratory (PySide6/Qt) for medical abstract fact-checking with step-by-step visualization
    • uv run python scripts/paper_reviewer_lab.py - Interactive Paper Reviewer laboratory (PySide6/Qt) for comprehensive paper assessment with DOI/PMID/PDF/text input
    • uv run python scripts/pubmed_search_lab.py - Interactive PubMed Search laboratory (PySide6/Qt) for searching PubMed API without local database storage
    • uv run python scripts/transparency_lab.py - Interactive Transparency Assessment laboratory (PySide6/Qt) for detecting undisclosed bias risk in biomedical papers
  • Transparency Assessment Tools:
    • uv run python transparency_analyzer_cli.py assess --doc-id 12345 - Assess transparency of a single document by ID
    • uv run python transparency_analyzer_cli.py assess --query "cardiovascular exercise" --limit 50 - Assess documents matching a search query
    • uv run python transparency_analyzer_cli.py assess --has-fulltext --limit 100 - Assess documents with full text available
    • uv run python transparency_analyzer_cli.py stats - Show transparency assessment statistics
    • uv run python transparency_analyzer_cli.py show --doc-id 12345 - Show detailed assessment for a document
    • uv run python transparency_analyzer_cli.py export --output results.json - Export assessments to JSON
    • uv run python transparency_analyzer_cli.py export --output results.csv - Export assessments to CSV
    • uv run python clinicaltrials_import_cli.py download --output-dir ~/clinicaltrials - Download ClinicalTrials.gov bulk data (~10GB)
    • uv run python clinicaltrials_import_cli.py import --input-dir ~/clinicaltrials - Import ClinicalTrials.gov trials to database
    • uv run python clinicaltrials_import_cli.py status - Show ClinicalTrials.gov import statistics
    • uv run python retraction_watch_cli.py import --file retraction_watch.csv - Import Retraction Watch CSV data
    • uv run python retraction_watch_cli.py lookup --doi 10.1234/example - Look up retraction status by DOI
    • uv run python retraction_watch_cli.py status - Show Retraction Watch import statistics
    • Note: Transparency assessment works fully offline using local Ollama models and documents in the database
  • PDF Processing Tools:
    • uv run python examples/pdf_processor_demo.py - PySide6 demo application for PDF section segmentation (biomedical publications)
    • uv run python tests/test_pdf_processor.py paper.pdf - Command-line test script for PDF processor library
  • Demonstrations:
    • uv run python examples/agent_demo.py - Multi-agent workflow demonstration
    • uv run python examples/citation_demo.py - Citation extraction examples
    • uv run python examples/reporting_demo.py - Report generation examples
    • uv run python examples/counterfactual_demo.py - Counterfactual analysis demonstration

OpenAthens Authentication

BMLibrarian includes secure OpenAthens authentication for accessing institutional journal subscriptions:

Key Features

  • Secure Session Management: JSON-based storage with 600 file permissions (no pickle vulnerability)
  • Browser Automation: Interactive login via Playwright
  • Cookie-Based Authentication: Automatic cookie injection for authenticated downloads
  • Session Validation Caching: Performance optimization with configurable TTL
  • HTTPS Enforcement: All institutional URLs must use HTTPS
  • Network Connectivity Checks: Pre-authentication validation

Security Improvements Implemented

  1. JSON Serialization: Replaced pickle to eliminate code execution vulnerability
  2. File Permissions: Session files stored with 600 permissions (owner read/write only)
  3. Cookie Pattern Matching: Specific regex patterns for OpenAthens/SAML/Shibboleth cookies
  4. Configurable Parameters: No magic numbers, all timeouts/intervals configurable
  5. URL Validation: HTTPS requirement and format validation
  6. Browser Crash Handling: Graceful cleanup on browser failures
  7. Session Cache TTL: Reduces validation overhead during batch downloads

Usage

from bmlibrarian.utils.openathens_auth import OpenAthensConfig, OpenAthensAuth
from bmlibrarian.utils.pdf_manager import PDFManager

# Configure OpenAthens
config = OpenAthensConfig(
    institution_url='https://institution.openathens.net/login',
    session_max_age_hours=24,
    session_cache_ttl=60
)

# Authenticate (interactive browser login)
auth = OpenAthensAuth(config)
import asyncio
asyncio.run(auth.login_interactive())

# Use with PDFManager for authenticated downloads
pdf_manager = PDFManager(openathens_auth=auth)
pdf_path = pdf_manager.download_pdf(document)

Documentation

  • User Guide: doc/users/openathens_guide.md - Complete usage guide with examples
  • Security Documentation: doc/developers/openathens_security.md - Security architecture and best practices
  • Unit Tests: tests/test_openathens_auth.py - Comprehensive test suite

PDF Discovery and Download

BMLibrarian includes an intelligent PDF retrieval system that discovers and downloads full-text PDFs using multiple strategies:

Workflow

  1. Discovery: Finds available PDF sources via PMC, Unpaywall, DOI, and direct URLs
  2. Direct HTTP Download: Attempts fast HTTP downloads from discovered sources (prioritizes open access)
  3. Browser Fallback: Uses Playwright browser automation for Cloudflare-protected or anti-bot protected sites

Key Features

  • Multi-Source Discovery: Searches PMC, Unpaywall, CrossRef, and DOI.org
  • Priority-Based Selection: Automatically selects best source (open access preferred)
  • Browser Fallback: Handles Cloudflare verification, embedded PDF viewers, and anti-bot protections
  • Year-Based Organization: PDFs stored in YYYY/filename.pdf structure
  • Database Integration: Automatically updates document records with PDF paths

Usage

from bmlibrarian.discovery import download_pdf_for_document
from pathlib import Path

# Download PDF with discovery workflow
result = download_pdf_for_document(
    document={'doi': '10.1038/nature12373', 'id': 123},
    output_dir=Path('~/pdfs').expanduser(),
    unpaywall_email='user@example.com',  # Recommended
    use_browser_fallback=True  # Falls back to browser if HTTP fails
)

if result.success:
    print(f"Downloaded: {result.file_path}")
    print(f"Source: {result.source.source_type.value}")

Configuration

{
  "unpaywall_email": "user@example.com",
  "discovery": {
    "timeout": 30,
    "prefer_open_access": true,
    "use_browser_fallback": true,
    "browser_headless": true,
    "browser_timeout": 60000
  }
}

Documentation

  • User Guide: doc/users/pdf_download_guide.md - Complete usage guide
  • Browser Downloader: doc/users/BROWSER_DOWNLOADER.md - Browser-based download details

PDF Export System

BMLibrarian includes a professional PDF export system for converting markdown-formatted research reports to publication-quality PDF documents.

Key Features

  • Pure Python: Uses ReportLab (BSD-licensed, free for redistribution)
  • Cross-Platform: Works on all major operating systems without system dependencies
  • Professional Quality: Publication-ready documents with proper formatting
  • Full Markdown Support: Headings, lists, tables, code blocks, links, emphasis
  • Custom Styling: Configurable fonts, colors, page sizes, and margins
  • Page Management: Automatic page numbers, headers, footers, and timestamps

Quick Start

from pathlib import Path
from bmlibrarian.exporters import PDFExporter

# Create exporter with default settings
exporter = PDFExporter()

# Export markdown report to PDF
exporter.export_report(
    report_content=final_report,  # Markdown-formatted report
    output_path=Path("research_report.pdf"),
    research_question="What are the cardiovascular benefits of exercise?",
    citation_count=45,
    document_count=128
)

Command-Line Tool

# Basic export (uses A4 paper size by default - international standard)
uv run python export_to_pdf.py report.md -o report.pdf

# With metadata and US Letter format
uv run python export_to_pdf.py report.md -o report.pdf \
    --title "Research Report" \
    --author "Dr. Smith" \
    --letter --font-size 12

# Research report format with metadata
uv run python export_to_pdf.py report.md -o report.pdf \
    --research-report \
    --citation-count 45 \
    --document-count 128

Configuration Options

from bmlibrarian.exporters import PDFExportConfig
from reportlab.lib.pagesizes import A4

config = PDFExportConfig(
    page_size=A4,  # or letter (default)
    base_font_size=12,
    heading_color=(0.1, 0.1, 0.3),  # RGB
    include_page_numbers=True,
    include_timestamp=True,
    include_header=True
)

exporter = PDFExporter(config)

Dependencies

  • reportlab (BSD License): Core PDF generation
  • markdown (BSD License): Markdown parsing
  • Pygments (BSD License): Syntax highlighting

All dependencies are free for commercial use and redistribution with no licensing fees.

Documentation

  • User Guide: doc/users/pdf_export_guide.md - Complete usage guide with examples
  • API Reference: See src/bmlibrarian/exporters/pdf_exporter.py for full API documentation

Architecture

BMLibrarian uses a sophisticated multi-agent architecture with enum-based workflow orchestration:

Core Components

  • Multi-Agent System: Specialized AI agents for different literature analysis tasks
  • Enum-Based Workflow: Flexible step orchestration with meaningful names and repeatable steps
  • Task Queue Orchestration: SQLite-based queue system for memory-efficient processing
  • Database Backend: PostgreSQL with pgvector extension for semantic search
  • Local LLM Integration: Ollama service for privacy-preserving AI inference

Agent Types

  1. QueryAgent: Natural language to PostgreSQL query conversion
  2. DocumentScoringAgent: Relevance scoring for user questions (1-5 scale)
  3. CitationFinderAgent: Extracts relevant passages from high-scoring documents
  4. ReportingAgent: Synthesizes citations into medical publication-style reports
  5. CounterfactualAgent: Analyzes documents to generate research questions for finding contradictory evidence
  6. EditorAgent: Creates balanced comprehensive reports integrating all evidence
  7. FactCheckerAgent: Evaluates biomedical statements (yes/no/maybe) with literature evidence for training data auditing
  8. PICOAgent: Extracts Population, Intervention, Comparison, and Outcome components from research papers for systematic reviews
  9. StudyAssessmentAgent: Evaluates research quality, study design, methodological rigor, bias risk, and trustworthiness of biomedical evidence
  10. PRISMA2020Agent: Assesses systematic reviews and meta-analyses against PRISMA 2020 reporting guidelines (27-item checklist with suitability pre-screening)
  11. PaperCheckerAgent: Validates medical abstract claims against contradictory literature using multi-strategy search (semantic + HyDE + keyword) with evidence-based verdicts (supports/contradicts/undecided)
  12. TransparencyAgent: Detects undisclosed bias risk by assessing funding disclosure, conflict of interest, data availability, trial registration, and author contributions using offline LLM analysis with bulk metadata enrichment (PubMed grants, ClinicalTrials.gov sponsors, Retraction Watch)

Document Card Factory System

BMLibrarian uses a factory pattern for creating document cards with consistent functionality in Qt/PySide6.

Key Features:

  • Three-State PDF Buttons: VIEW (local PDF), FETCH (download from URL), UPLOAD (manual upload)
  • Consistent Styling: Unified appearance using centralized stylesheet system
  • Context-Aware Rendering: Different card styles for literature, scoring, citations, etc.
  • Extensible Design: Easy to add new card variations or contexts

Factory Classes:

  • DocumentCardFactoryBase: Abstract base class with common utilities
  • QtDocumentCardFactory: Qt-specific implementation with QFrame cards and integrated PDF buttons

Usage Example:

from bmlibrarian.gui.qt.qt_document_card_factory import QtDocumentCardFactory
from bmlibrarian.gui.qt.document_card_factory_base import DocumentCardData, CardContext

factory = QtDocumentCardFactory(parent=parent_widget)
card_data = DocumentCardData(
    doc_id=12345,
    title="Example Study",
    authors=["Smith J", "Johnson A"],
    year=2023,
    relevance_score=4.5,
    pdf_url="https://example.com/paper.pdf",
    context=CardContext.LITERATURE,
    show_pdf_button=True
)
card = factory.create_card(card_data)

PDF Button States:

  • VIEW (Blue): Local PDF exists → Opens in viewer
  • FETCH (Orange): URL available → Downloads then transitions to VIEW
  • UPLOAD (Green): No PDF → File picker then transitions to VIEW

See Documentation:

  • Developer guide: doc/developers/document_card_factory_system.md
  • Demo: examples/document_card_factory_demo.py

Multi-Model Query Generation

BMLibrarian supports using multiple AI models to generate diverse database queries for improved document retrieval. This feature leverages the strengths of different models to create query variations that often find more relevant literature than single-model approaches.

Key Features:

  • Query Diversity: Generate 1-3 queries per model using up to 3 different models
  • Improved Coverage: Typically finds 20-40% more relevant documents
  • Serial Execution: Simple serial processing optimized for local Ollama + PostgreSQL instances
  • Automatic De-duplication: Query and document ID de-duplication handled automatically
  • Backward Compatible: Feature flag system (disabled by default)

Configuration (~/.bmlibrarian/config.json):

{
  "query_generation": {
    "multi_model_enabled": true,
    "models": [
      "medgemma-27b-text-it-Q8_0:latest",
      "gpt-oss:20b",
      "medgemma4B_it_q8:latest"
    ],
    "queries_per_model": 1,
    "execution_mode": "serial",
    "deduplicate_results": true,
    "show_all_queries_to_user": true,
    "allow_query_selection": true
  }
}

Architecture Highlights:

  • Serial Execution: Simple for-loops (not parallel) prevent resource bottlenecks with local instances
  • ID-Only Queries: Fast document ID retrieval (~10x faster) followed by single bulk document fetch
  • Type-Safe Results: Dataclasses (QueryGenerationResult, MultiModelQueryResult) for all query results
  • Error Resilience: Model failures handled gracefully, system continues with available models

Performance:

  • Overhead: ~2-3x slower than single-model (typically 5-15 seconds vs 2-5 seconds)
  • Benefit: 20-40% more relevant documents with 2-3 models
  • Recommended: Start with 2 models, 1 query each for best balance

See Documentation:

  • User guide: doc/users/multi_model_query_guide.md
  • Technical docs: doc/developers/multi_model_architecture.md

Workflow Orchestration System

The new enum-based workflow system (workflow_steps.py) provides:

  • WorkflowStep Enum: Meaningful step names instead of brittle numbering
  • Repeatable Steps: Query refinement, threshold adjustment, citation requests
  • Branching Logic: Conditional step execution and error recovery
  • Context Management: State preservation across step executions
  • Auto Mode Support: Graceful handling of non-interactive execution

Workflow Steps

COLLECT_RESEARCH_QUESTION → GENERATE_AND_EDIT_QUERY → SEARCH_DOCUMENTS → 
REVIEW_SEARCH_RESULTS → SCORE_DOCUMENTS → EXTRACT_CITATIONS → 
GENERATE_REPORT → PERFORM_COUNTERFACTUAL_ANALYSIS → 
SEARCH_CONTRADICTORY_EVIDENCE → EDIT_COMPREHENSIVE_REPORT → 
REVIEW_AND_REVISE_REPORT → EXPORT_REPORT

Iterative Capabilities

  • Query Refinement: When search results are insufficient
  • Threshold Adjustment: For better citation extraction
  • Citation Requests: Agents can request more evidence during report generation
  • Report Revision: Iterative improvement of generated reports
  • Evidence Enhancement: Counterfactual analysis for finding contradictory studies

Queue System

  • QueueManager: SQLite-based persistent task queuing
  • AgentOrchestrator: Coordinates multi-agent workflows
  • WorkflowExecutor: Manages step execution with context tracking
  • Task Priorities: HIGH, NORMAL, LOW priority levels
  • Batch Processing: Memory-efficient handling of large document sets

Project Structure

bmlibrarian/
├── src/bmlibrarian/           # Main source code
│   ├── agents/                # Multi-agent system
│   │   ├── __init__.py        # Agent module exports
│   │   ├── base.py            # BaseAgent foundation class
│   │   ├── query_agent.py     # Natural language query processing
│   │   ├── scoring_agent.py   # Document relevance scoring
│   │   ├── citation_agent.py  # Citation extraction from documents
│   │   ├── reporting_agent.py # Report synthesis and formatting
│   │   ├── counterfactual_agent.py # Counterfactual analysis for contradictory evidence
│   │   ├── editor_agent.py    # Comprehensive report editing and integration
│   │   ├── queue_manager.py   # SQLite-based task queue system
│   │   ├── orchestrator.py    # Multi-agent workflow coordination
│   │   ├── transparency_data.py # Transparency assessment data models and constants
│   │   ├── transparency_agent.py # TransparencyAgent for undisclosed bias risk detection
│   │   └── query_generation/  # Multi-model query generation system
│   │       ├── __init__.py    # Query generation module exports
│   │       ├── data_types.py  # Type-safe dataclasses for query results
│   │       └── generator.py   # Multi-model query generator
│   ├── importers/             # External data source importers
│   │   ├── __init__.py        # Importer module exports
│   │   ├── medrxiv_importer.py # MedRxiv preprint importer (multi-format extraction)
│   │   ├── medrxiv_content_extractor.py # Multi-format content extractor (text/HTML/XML/PDF)
│   │   ├── medrxiv_meca_importer.py # MedRxiv MECA bulk importer (AWS S3)
│   │   ├── pubmed_importer.py # PubMed E-utilities importer (targeted imports)
│   │   ├── pubmed_bulk_importer.py # PubMed FTP bulk importer (complete mirror)
│   │   ├── pdf_matcher.py     # LLM-based PDF matching and import (DOI/PMID/title matching)
│   │   ├── clinicaltrials_importer.py # ClinicalTrials.gov bulk importer (sponsor classification)
│   │   ├── retraction_watch_importer.py # Retraction Watch CSV importer (retraction detection)
│   │   └── README.md          # Importer documentation
│   ├── embeddings/            # Document embedding generation
│   │   ├── __init__.py        # Embeddings module exports
│   │   └── document_embedder.py # Document embedder (uses Ollama)
│   ├── exporters/             # Export functionality (PDF, HTML, etc.)
│   │   ├── __init__.py        # Exporters module exports
│   │   └── pdf_exporter.py    # Markdown to PDF exporter using ReportLab
│   ├── pdf_processor/         # PDF processing and segmentation for biomedical publications
│   │   ├── __init__.py        # PDF processor module exports
│   │   ├── models.py          # Data models (TextBlock, Section, Document, SectionType)
│   │   ├── extractor.py       # PDF text extraction with layout analysis (PyMuPDF)
│   │   ├── segmenter.py       # Section segmentation using NLP and heuristics
│   │   └── README.md          # PDF processor documentation
│   ├── discovery/             # Full-text PDF discovery system
│   │   ├── __init__.py        # Discovery module exports
│   │   ├── data_types.py      # Type-safe dataclasses (PDFSource, DiscoveryResult, etc.)
│   │   ├── resolvers.py       # Source resolvers (PMC, Unpaywall, DOI, OpenAthens)
│   │   └── full_text_finder.py # Discovery orchestrator
│   ├── benchmarking/          # Model benchmarking system
│   │   ├── __init__.py        # Benchmarking module exports
│   │   ├── data_types.py      # Type-safe dataclasses (BenchmarkRun, AlignmentMetrics, etc.)
│   │   ├── database.py        # Database operations (benchmarking schema)
│   │   └── runner.py          # BenchmarkRunner orchestration
│   └── cli/                   # Modular CLI architecture
│       ├── __init__.py        # CLI module exports
│       ├── config.py          # Configuration management
│       ├── ui.py              # User interface components
│       ├── query_processing.py # Query editing and search
│       ├── formatting.py      # Report formatting and export
│       ├── workflow.py        # Workflow orchestration
│       └── workflow_steps.py  # Enum-based workflow step definitions
│   └── gui/                   # Graphical user interfaces (PySide6/Qt)
│       ├── __init__.py        # GUI module exports
│       └── qt/                # Qt/PySide6-based GUI components
│           ├── __init__.py    # Qt module entry point
│           ├── core/          # Core application infrastructure
│           ├── plugins/       # Plugin system (research, fact_checker, etc.)
│           ├── widgets/       # Reusable Qt widgets
│           ├── resources/     # Resources and styling (dpi_scale, stylesheets)
│           └── qt_document_card_factory.py  # Qt document card factory
│   └── lab/                   # Experimental tools and interfaces
│       ├── __init__.py        # Lab module exports
│       ├── query_lab.py       # QueryAgent experimental GUI
│       ├── pico_lab.py        # PICOAgent experimental GUI for PICO component extraction
│       ├── study_assessment_lab.py # StudyAssessmentAgent experimental GUI for study quality evaluation
│       ├── prisma2020_lab.py  # PRISMA2020Agent experimental GUI for PRISMA 2020 compliance assessment
│       └── transparency_lab.py # TransparencyAgent experimental GUI for undisclosed bias risk assessment
│   └── factchecker/           # Fact-checker module (PostgreSQL-based)
│       ├── __init__.py        # Fact-checker module exports
│       ├── agent/             # Fact-checker agent
│       │   ├── __init__.py
│       │   └── fact_checker_agent.py  # FactCheckerAgent (orchestrates multi-agent workflow)
│       ├── db/                # Database operations
│       │   ├── __init__.py
│       │   └── database.py    # FactCheckerDB (PostgreSQL factcheck schema)
│       ├── cli/               # CLI application
│       │   ├── __init__.py
│       │   ├── app.py         # Main CLI entry point
│       │   ├── commands.py    # Command handlers
│       │   └── formatters.py  # Output formatting
│       └── gui/               # Review GUI application
│           ├── __init__.py
│           ├── review_app.py  # Main review application
│           ├── data_manager.py    # Database queries
│           ├── annotation_manager.py  # Annotation logic
│           ├── statement_display.py   # Statement UI
│           ├── citation_display.py    # Citation cards
│           └── dialogs.py     # Login/export dialogs
│   └── paperchecker/          # PaperChecker module (abstract fact-checking)
│       ├── __init__.py        # PaperChecker module exports
│       ├── data_models.py     # Type-safe dataclasses (Statement, Verdict, etc.)
│       ├── agent.py           # PaperCheckerAgent (main orchestrator)
│       ├── database.py        # PaperCheckDB (PostgreSQL papercheck schema)
│       ├── components/        # Sub-components
│       │   ├── statement_extractor.py   # Extract claims from abstracts
│       │   ├── counter_statement_generator.py  # Generate semantic negations
│       │   ├── hyde_generator.py        # HyDE abstract and keyword generation
│       │   ├── search_coordinator.py    # Multi-strategy search orchestration
│       │   └── verdict_analyzer.py      # Evidence analysis and verdict generation
│       └── cli/               # CLI application
│           ├── app.py         # Main CLI entry point
│           ├── commands.py    # Command handlers
│           └── formatters.py  # Output formatting
├── tests/                     # Comprehensive test suite
│   ├── test_query_agent.py    # Query processing tests
│   ├── test_scoring_agent.py  # Document scoring tests
│   ├── test_citation_agent.py # Citation extraction tests
│   ├── test_reporting_agent.py# Report generation tests
│   ├── test_counterfactual_agent.py # Counterfactual analysis tests
│   ├── test_transparency_agent.py # Transparency assessment tests (43 tests)
│   └── paperchecker/          # PaperChecker test suite
│       ├── test_statement_extractor.py   # Statement extraction tests
│       ├── test_counter_generator.py     # Counter-statement tests
│       ├── test_hyde_generator.py        # HyDE generation tests
│       ├── test_search_coordinator.py    # Search tests
│       ├── test_verdict_analyzer.py      # Verdict tests
│       └── test_end_to_end.py            # Full workflow tests
├── examples/                  # Demonstration scripts
│   ├── agent_demo.py          # Multi-agent workflow examples
│   ├── citation_demo.py       # Citation extraction demonstrations
│   ├── reporting_demo.py      # Report generation examples
│   └── counterfactual_demo.py # Counterfactual analysis demonstrations
├── doc/                       # Comprehensive documentation
│   ├── users/                 # End-user guides
│   │   ├── query_agent_guide.md
│   │   ├── citation_guide.md
│   │   ├── reporting_guide.md
│   │   ├── counterfactual_guide.md
│   │   ├── fact_checker_guide.md
│   │   ├── fact_checker_review_guide.md  # Fact-checker review GUI guide
│   │   ├── medrxiv_import_guide.md  # MedRxiv import guide
│   │   ├── document_embedding_guide.md  # Document embedding guide
│   │   ├── document_interrogation_guide.md  # Document interrogation tab guide
│   │   ├── pdf_import_guide.md  # PDF import and matching guide
│   │   ├── study_assessment_guide.md  # Study quality assessment guide
│   │   ├── prisma2020_guide.md  # PRISMA 2020 compliance assessment guide
│   │   ├── multi_model_query_guide.md  # Multi-model query generation guide
│   │   ├── paper_checker_guide.md  # PaperChecker overview and quick start
│   │   ├── paper_checker_cli_guide.md  # PaperChecker CLI reference
│   │   ├── paper_checker_lab_guide.md  # PaperChecker laboratory guide
│   │   ├── paper_reviewer_lab_guide.md  # Paper Reviewer laboratory guide
│   │   ├── full_text_discovery_guide.md  # Full-text PDF discovery guide
│   │   ├── transparency_assessment_guide.md  # Transparency assessment user guide
│   │   ├── clinicaltrials_import_guide.md  # ClinicalTrials.gov import guide
│   │   └── retraction_watch_guide.md  # Retraction Watch import guide
│   └── developers/            # Technical documentation
│       ├── agent_module.md
│       ├── citation_system.md
│       ├── reporting_system.md
│       ├── counterfactual_system.md
│       ├── fact_checker_system.md
│       ├── study_assessment_system.md  # Study quality assessment system
│       ├── prisma2020_system.md  # PRISMA 2020 compliance assessment system
│       ├── document_interrogation_ui_spec.md  # Document interrogation UI specification
│       ├── multi_model_architecture.md  # Multi-model architecture docs
│       ├── paper_checker_architecture.md  # PaperChecker system design and architecture
│       ├── full_text_discovery_system.md  # Full-text PDF discovery architecture
│       └── transparency_assessment_system.md  # Transparency assessment system architecture
├── scripts/                   # Utility scripts and laboratory tools
│   ├── query_lab.py           # QueryAgent experimental laboratory GUI
│   ├── pico_lab.py            # PICOAgent experimental laboratory GUI
│   ├── study_assessment_lab.py # StudyAssessmentAgent laboratory GUI
│   ├── prisma2020_lab.py      # PRISMA2020Agent laboratory GUI
│   ├── paper_weight_lab.py    # PaperWeightAssessmentAgent laboratory GUI
│   ├── paper_checker_lab.py   # PaperChecker laboratory GUI
│   ├── paper_reviewer_lab.py  # Paper Reviewer laboratory GUI (comprehensive assessment)
│   ├── transparency_lab.py    # TransparencyAgent laboratory GUI (bias risk assessment)
│   ├── export_review_package.py # Export SQLite review packages
│   ├── export_human_evaluations.py # Export human annotations to JSON
│   ├── import_human_evaluations.py # Re-import human evaluations
│   ├── chunk_worker.py        # Background worker for semantic chunk queue
│   ├── rechunk_semantic_chunks.py # Re-chunk documents CLI
│   └── pdf_verification_gui.py # PDF verification review GUI
├── data/                      # Temporary test data (gitignored)
├── bmlibrarian_cli.py         # Interactive CLI application with full multi-agent workflow
├── bmlibrarian_qt.py          # Qt-based main entry point
├── bmlibrarian_research_gui.py # Desktop research GUI application (modular entry point)
├── fact_checker_cli.py        # Fact-checker CLI for training data auditing
├── fact_checker_review_gui.py # Human review and annotation GUI for fact-checking results
├── fact_checker_stats.py      # Comprehensive statistical analysis for fact-checker evaluations
├── paper_checker_cli.py       # PaperChecker CLI for fact-checking medical abstracts against literature
├── model_benchmark_cli.py     # Model benchmarking CLI for evaluating document scoring models
├── medrxiv_import_cli.py      # MedRxiv preprint import CLI (multi-format extraction)
├── medrxiv_meca_cli.py        # MedRxiv MECA bulk sync CLI (AWS S3)
├── pubmed_import_cli.py       # PubMed E-utilities import CLI (targeted imports)
├── pubmed_bulk_cli.py         # PubMed FTP bulk download/import CLI (complete mirror)
├── pubmed_repair_cli.py       # PubMed download repair CLI (scan/fix corrupted gzip files)
├── pmc_bulk_cli.py            # PMC Open Access bulk download/import CLI
├── europe_pmc_pdf_cli.py      # Europe PMC PDF bulk download CLI
├── pdf_import_cli.py          # PDF import CLI with LLM-based metadata extraction and matching
├── transparency_analyzer_cli.py # Transparency assessment CLI for detecting undisclosed bias risk
├── clinicaltrials_import_cli.py # ClinicalTrials.gov bulk download/import CLI
├── retraction_watch_cli.py    # Retraction Watch CSV import CLI
├── migrate_config_to_db.py    # Settings migration CLI
├── export_to_pdf.py           # Markdown to PDF export CLI tool
├── initial_setup_and_download.py  # Database setup and battle-testing script
├── setup_wizard.py            # PySide6 setup wizard
├── baseline_schema.sql        # Base PostgreSQL schema definition
├── migrations/                # Database migration scripts
├── test_database.env.example  # Example environment file for testing
├── SETUP_GUIDE.md            # Comprehensive setup and testing guide
├── pyproject.toml             # Project configuration and dependencies
├── uv.lock                    # Locked dependency versions
├── .env                       # Environment configuration
└── README.md                  # Project description

Development Notes

Project Maturity

  • Current State: Full multi-agent architecture implemented with comprehensive testing and documentation
  • Core Features: Query processing, document scoring, citation extraction, and report generation are fully functional
  • Production Ready: Complete system with queue orchestration, error handling, and quality control

Development Principles

  • Modern Python Standards: Uses pyproject.toml, type hints, and Python >=3.12
  • Enum-Based Architecture: Flexible workflow orchestration with meaningful step names
  • Comprehensive Testing: Unit tests for all agents with >95% coverage
  • Documentation First: Both developer and user documentation for all features
  • AI-Powered: Local LLM integration via Ollama for privacy-preserving processing
  • Scalable Architecture: Queue-based processing for memory-efficient large-scale operations
  • Iterative Workflows: Support for repeatable steps and agent-driven refinement

Database Safety

  • CRITICAL: Never modify or drop the production database "knowledgebase"
  • Development: Use "bmlibrarian_dev" database for testing/migration experiments
  • Production Access: Read-only access unless explicitly instructed otherwise
  • Data Integrity: All document IDs are programmatically verified to prevent hallucination

Code Quality Standards

  • Testing: Write comprehensive unit tests for every new module
  • Documentation: Create both user guides (doc/users/) and developer docs (doc/developers/)
  • Type Safety: Use type hints throughout the codebase
  • Error Handling: Robust error recovery and logging
  • Model Standards: Only use approved models (gpt-oss:20b, medgemma4B_it_q8:latest)
  • Temporal Precision: Use specific years instead of vague temporal references (e.g., "In a 2023 study" NOT "In a recent study")

Agent Development Guidelines

  • BaseAgent Pattern: All agents inherit from BaseAgent with standardized interfaces
  • Configuration Integration: Agents must use get_model() and get_agent_config() from config system
  • Parameter Filtering: Filter agent config to only include supported parameters (temperature, top_p, etc.)
  • Queue Integration: New agents should support queue-based processing
  • Workflow Integration: Agents should work with enum-based workflow system
  • Connection Testing: All agents must implement connection testing methods
  • Progress Tracking: Support progress callbacks for long-running operations
  • Document ID Integrity: Always use real database IDs, never mock/fabricated references
  • Step Handler Methods: Implement appropriate workflow step handlers for agent actions
  • No Artificial Limits: Process ALL documents unless explicitly configured otherwise

Workflow Development Guidelines

  • WorkflowStep Enum: Use meaningful names for new workflow steps
  • Repeatable Steps: Mark steps as repeatable when they support iteration
  • Branching Logic: Implement conditional execution and error recovery
  • Context Management: Preserve state across step executions
  • Auto Mode Support: Ensure steps work in non-interactive mode

Usage Examples

Research GUI Application

BMLibrarian includes a comprehensive desktop research application built with PySide6/Qt:

# Start the research GUI application
uv run python bmlibrarian_research_gui.py

# Research GUI Features:
# - Multi-line text input for medical research questions
# - Interactive/automated workflow toggle
# - Visual workflow progress with collapsible step cards
# - Real-time agent execution with proper model configuration
# - Formatted markdown report preview with scrolling
# - Full integration with BMLibrarian's multi-agent system

# Command line options:
uv run python bmlibrarian_research_gui.py --auto "research question"  # Automated execution
uv run python bmlibrarian_research_gui.py --quick                    # Quick mode with limits
uv run python bmlibrarian_research_gui.py --max-results 100          # Custom search limits
uv run python bmlibrarian_research_gui.py --score-threshold 3.0      # Custom relevance threshold

The Research GUI provides:

  • Desktop Application: Native cross-platform desktop interface using PySide6
  • Visual Workflow: Collapsible cards showing real-time progress through 11 workflow steps
  • Agent Integration: Uses configured models from ~/.bmlibrarian/config.json
  • Document Processing: Scores ALL found documents by default (no artificial limits)
  • Citation Extraction: Processes ALL documents above relevance threshold
  • Report Generation: Full markdown rendering with GitHub-style formatting
  • Configuration Support: Respects agent models, parameters, and thresholds from config
  • Performance Modes: Normal (all documents) vs Quick (limited for speed)

Configuration GUI Application

BMLibrarian includes a modern graphical configuration interface built with PySide6/Qt:

# Start the desktop configuration GUI
uv run python bmlibrarian_config_gui.py

# GUI Features:
# - Native desktop application with tabbed interface
# - Document Interrogation tab: Interactive document viewer with AI chatbot for Q&A
# - Separate configuration tabs for each agent
# - Model selection with live refresh from Ollama server
# - Parameter adjustment with sliders and input fields
# - Configuration save/load functionality
# - Connection testing to verify Ollama availability
# - Reset to defaults option
# - Cross-platform compatibility

# Command line options:
uv run python bmlibrarian_config_gui.py --debug            # Enable debug mode

The GUI provides:

  • Native Desktop App: Cross-platform desktop application using PySide6
  • Document Interrogation: Interactive split-pane interface for document Q&A
    • Left pane: Document viewer (PDF, Markdown, text files) with 60% width
    • Right pane: Chat interface with dialogue bubbles (40% width)
    • File selector and model dropdown in top bar
    • Support for programmatic document loading from other plugins
    • Message history with user/AI distinction
    • Real-time chat with selected Ollama model
  • Agent Configuration: Individual tabs for Query, Scoring, Citation, Reporting, Counterfactual, and Editor agents
  • Model Management: Dropdown selection with live model refresh from Ollama
  • Parameter Tuning: Interactive sliders for temperature, top-p, and agent-specific settings
  • General Settings: Ollama server configuration, database settings, and CLI defaults
  • File Operations: Save/load configuration files with JSON format
  • Connection Testing: Verify Ollama server connectivity and list available models

Fact-Checker Review GUI Application

BMLibrarian includes a human review and annotation interface for fact-checking results:

# Start the Fact-Checker Review GUI
uv run python fact_checker_review_gui.py

# Load JSON file (auto-creates SQLite database for annotations)
uv run python fact_checker_review_gui.py --input-file results.json

# Load existing database directly
uv run python fact_checker_review_gui.py --input-file results.db

# Incremental mode: only show statements without AI evaluations
uv run python fact_checker_review_gui.py --input-file results.json --incremental

The Fact-Checker Review GUI provides:

  • Database Auto-Creation: Automatically creates SQLite database from JSON files (e.g., results.jsonresults.db)
  • Intelligent Merging: If database exists, imports new statements from JSON without overwriting existing annotations
  • CLI-Consistent Behavior: Same database workflow as the fact-checker CLI for seamless integration
  • Real-Time Persistence: All annotations saved directly to database as you review
  • Incremental Mode: Filter to show only unevaluated statements (consistent with CLI)
  • Multi-User Support: Track annotations by different reviewers with annotator metadata
  • Evidence Review: Examine supporting citations with expandable cards showing full abstracts
  • Annotation Comparison: View original, AI, and human annotations side-by-side

Database Workflow (matches CLI):

  1. Load results.json: Checks if results.db exists
  2. If DB exists: Merges new statements from JSON (skips existing with evaluations/annotations)
  3. If DB doesn't exist: Creates new database and imports all JSON data
  4. All annotations are saved to the database in real-time

This ensures that the GUI and CLI provide identical database management behavior, making it easy to switch between interfaces or use both for different tasks.

Fact-Checker Distribution System for Inter-Rater Reliability

BMLibrarian includes a complete distribution system for sending fact-check results to external reviewers without requiring PostgreSQL installation. This enables inter-rater reliability analysis with multiple independent human annotators.

Complete Workflow:

  1. Export Review Package (PostgreSQL → SQLite):

    uv run python export_review_package.py --output review_package.db --exported-by username
    • Creates self-contained SQLite database with:
      • All statements and AI evaluations
      • Evidence citations with full document abstracts
      • Document metadata (titles, PMIDs, DOIs)
      • NO human annotations from other reviewers
    • Typical size: 100-500 MB for 1000 statements
    • Ready for distribution via file sharing
  2. Distribute to External Reviewers:

    • Send .db file + fact_checker_review_gui.py to reviewers
    • No PostgreSQL installation required
    • Works offline with full functionality
  3. Reviewer Annotation (SQLite):

    uv run python fact_checker_review_gui.py --user alice --db-file review_package.db
    • Read-write mode: Annotations saved to SQLite in real-time
    • Full abstract display for all citations
    • Same interface as PostgreSQL version
    • Supports blind mode and incremental mode
  4. Export Human Evaluations (SQLite → JSON):

    uv run python export_human_evaluations.py --db-file review_package.db --annotator alice -o alice.json
    • Lightweight JSON export (1-10 KB per statement)
    • Contains: statement_id, statement_text, annotation, explanation
    • Reviewer sends back only the small JSON file
  5. Re-import to PostgreSQL (JSON → PostgreSQL):

    uv run python import_human_evaluations.py alice.json bob.json charlie.json
    • Creates/updates annotator records with username tagging
    • Validates statements match by ID and text
    • Inserts/updates annotations (one per annotator per statement)
    • Reports statistics (inserted, updated, errors)
    • Update/overwrite behavior for duplicate annotations
  6. Analyze Inter-Rater Agreement:

    -- PostgreSQL query
    SELECT * FROM factcheck.calculate_inter_annotator_agreement();
    SELECT * FROM factcheck.v_inter_annotator_agreement;

Key Features:

  • Database Abstraction: Unified interface supporting both PostgreSQL and SQLite backends
  • Self-Contained Packages: All data needed for review in single .db file
  • No Dependencies: Reviewers don't need PostgreSQL, just Python + PySide6
  • Validation: Statement text matching prevents mismatches during import
  • Multi-Reviewer Support: Track annotations by username for inter-rater analysis
  • Security: Audit trail via export_history, encrypted distribution recommended

Documentation:

  • Quick Start Guide: doc/users/FACT_CHECKER_DISTRIBUTION_QUICKSTART.md
  • Implementation Plan: doc/developers/FACT_CHECKER_DISTRIBUTION_PLAN.md
  • User Guide: doc/users/fact_checker_distribution_guide.md (if exists)

Architecture:

  • src/bmlibrarian/factchecker/db/abstract_db.py: Abstract database interface
  • src/bmlibrarian/factchecker/db/sqlite_db.py: SQLite implementation
  • src/bmlibrarian/factchecker/db/postgresql_db.py: PostgreSQL wrapper
  • src/bmlibrarian/factchecker/db/sqlite_schema.sql: Complete SQLite schema
  • export_review_package.py: Review package export script
  • export_human_evaluations.py: Human annotations export script
  • import_human_evaluations.py: PostgreSQL import script

Fact-Checker Statistical Analysis

BMLibrarian includes a comprehensive statistical analysis tool (fact_checker_stats.py) for evaluating fact-checker performance and inter-rater reliability. The tool calculates multiple metrics with proper statistical rigor.

Statistical Metrics Calculated:

  • Concordance rates: Agreement between AI evaluations and expected answers or human annotations with 95% confidence intervals using Wilson score interval (binomial proportions)
  • Cohen's kappa: Inter-rater reliability coefficient with standard errors and 95% confidence intervals
  • Confusion matrices: Cross-tabulation of evaluations with accuracy, precision, recall, and F1-scores
  • Confidence calibration: Relationship between AI confidence levels (low/medium/high) and actual accuracy
  • Chi-square tests: Statistical significance testing for categorical data (p < 0.05)
  • Category-specific transitions: Analysis of evaluation changes:
    • Yes → No transitions: Percentage of statements where evaluations changed from "yes" to "no"
    • No → Yes transitions: Percentage of statements where evaluations changed from "no" to "yes"
    • Certainty changes: Percentage moving to "maybe" (increased uncertainty)
    • Stability: Percentage with unchanged evaluations

Usage:

# Console output only
uv run python fact_checker_stats.py

# Export to CSV files
uv run python fact_checker_stats.py --export-csv stats_output/

# Create visualization plots (confusion matrices, calibration curves, transition charts)
uv run python fact_checker_stats.py --export-csv stats_output/ --plot

Output Files:

  • ai_vs_expected.csv: Raw data for AI evaluations vs expected answers
  • ai_vs_human.csv: Raw data for AI evaluations vs human annotations
  • human_pairs.csv: Paired human annotations for inter-rater analysis
  • summary_statistics.json: Complete statistical results in JSON format
  • confusion_matrix_ai_vs_expected.png: Heatmap visualization
  • confidence_calibration.png: Calibration curve with error bars
  • transition_analysis.png: Bar charts showing category transitions

Key Features:

  • Rigorous Statistics: Uses Wilson score intervals for binomial proportions, Fleiss standard errors for kappa
  • Three Comparisons: AI vs Expected, AI vs Human, Human vs Human inter-rater agreement
  • Significance Testing: Chi-square tests for independence with p-value interpretation
  • Confidence Assessment: Evaluates whether AI confidence levels correlate with actual accuracy
  • Transition Analysis: Identifies patterns in evaluation changes for temporal validity studies
  • Publication-Ready: Generates formatted reports and high-resolution plots (300 DPI)

Statistical Methods:

  • Wilson score interval for concordance rate confidence intervals (better coverage than normal approximation)
  • Fleiss et al. (1969) formula for Cohen's kappa standard errors
  • Pearson chi-square test for categorical independence
  • Landis & Koch (1977) interpretation scale for kappa values

Documentation:

  • Complete guide: doc/users/FACT_CHECKER_STATS_GUIDE.md
  • Statistical methods and interpretation guidelines included
  • Example output with real-world interpretation

Enum-Based Workflow System

from bmlibrarian.agents import (
    QueryAgent, DocumentScoringAgent, CitationFinderAgent, 
    ReportingAgent, CounterfactualAgent, EditorAgent, AgentOrchestrator
)
from bmlibrarian.cli.workflow_steps import (
    WorkflowStep, WorkflowDefinition, WorkflowExecutor, 
    create_default_research_workflow, StepResult
)

# Initialize workflow system
workflow_definition = create_default_research_workflow()
workflow_executor = WorkflowExecutor(workflow_definition)

# Initialize orchestrator and agents
orchestrator = AgentOrchestrator(max_workers=4)
query_agent = QueryAgent(orchestrator=orchestrator)
scoring_agent = DocumentScoringAgent(orchestrator=orchestrator)
citation_agent = CitationFinderAgent(orchestrator=orchestrator)
reporting_agent = ReportingAgent(orchestrator=orchestrator)
counterfactual_agent = CounterfactualAgent(orchestrator=orchestrator)
editor_agent = EditorAgent(orchestrator=orchestrator)

# Set up workflow context
user_question = "What are the cardiovascular benefits of exercise?"
workflow_executor.add_context('research_question', user_question)

# Execute workflow steps
current_step = workflow_definition.steps[0]
while current_step:
    execution = workflow_executor.execute_step(current_step, step_handler)
    workflow_executor.execution_history.append(execution)
    
    if execution.result == StepResult.SUCCESS:
        current_step = workflow_definition.get_next_step(current_step, workflow_executor.context)
    elif execution.result == StepResult.BRANCH:
        current_step = workflow_executor.get_context('branch_to_step')
    else:
        break

# Get final results from context
final_report = workflow_executor.get_context('comprehensive_report')
counterfactual_analysis = workflow_executor.get_context('counterfactual_analysis')

Basic Multi-Agent Workflow (Legacy)

# For direct agent usage without workflow orchestration
from bmlibrarian.agents import (
    QueryAgent, DocumentScoringAgent, CitationFinderAgent, 
    ReportingAgent, CounterfactualAgent, AgentOrchestrator
)

# Initialize orchestrator and agents
orchestrator = AgentOrchestrator(max_workers=4)
query_agent = QueryAgent(orchestrator=orchestrator)
scoring_agent = DocumentScoringAgent(orchestrator=orchestrator)
citation_agent = CitationFinderAgent(orchestrator=orchestrator)
reporting_agent = ReportingAgent(orchestrator=orchestrator)
counterfactual_agent = CounterfactualAgent(orchestrator=orchestrator)

# Manual workflow execution
user_question = "What are the cardiovascular benefits of exercise?"
documents = query_agent.search_documents(user_question)
scored_docs = [(doc, scoring_agent.evaluate_document(user_question, doc)) 
               for doc in documents if scoring_agent.evaluate_document(user_question, doc)]
citations = citation_agent.process_scored_documents_for_citations(
    user_question=user_question, scored_documents=scored_docs, score_threshold=2.5)
report = reporting_agent.generate_citation_based_report(
    user_question=user_question, citations=citations, format_output=True)

Key Features Demonstrated

  • Enum-Based Workflow: Flexible step orchestration with meaningful names
  • Iterative Processing: Repeatable steps for query refinement and evidence enhancement
  • Natural Language Processing: Convert questions to database queries
  • Relevance Assessment: AI-powered document scoring (1-5 scale)
  • Citation Extraction: Extract specific passages that answer questions
  • Evidence Synthesis: Generate professional medical reports with proper references
  • Counterfactual Analysis: Generate research questions to find contradictory evidence
  • Comprehensive Editing: Balanced report integration with all evidence types
  • Quality Control: Document verification and evidence strength assessment
  • Confidence Assessment: Evaluate evidence reliability with contradictory evidence search
  • Agent-Driven Refinement: Agents can request more citations during report generation
  • Auto Mode Support: Non-interactive execution with graceful error handling
  • Scalable Processing: Queue-based batch processing for large datasets

Important Instructions and Reminders

When developing new agents or features:

  1. Always inherit from BaseAgent for consistent interfaces
  2. Use configuration system: Load models via get_model() and settings via get_agent_config()
  3. Filter configuration parameters: Only pass supported parameters to agent constructors
  4. Process ALL documents by default: No artificial limits unless explicitly configured
  5. Implement comprehensive testing with realistic test data
  6. Create both user and developer documentation for all new features
  7. Never create or modify production database without explicit permission
  8. Ensure document ID verification to prevent citation hallucination
  9. Support queue-based processing for scalability
  10. Include progress tracking for long-running operations
  11. Use enum-based workflow system for new workflow steps (workflow_steps.py)
  12. Use modular GUI architecture for new GUI features (see src/bmlibrarian/gui/)
  13. Include counterfactual analysis capabilities where appropriate for evidence validation
  14. Implement workflow step handlers for agent integration with orchestration system
  15. Support auto mode execution with graceful fallbacks for interactive features

Testing and Quality Assurance:

  • Run full test suite: uv run python -m pytest tests/
  • Test CLI: uv run python bmlibrarian_cli.py --quick
  • Test Research GUI: uv run python bmlibrarian_research_gui.py --auto "test question" --quick
  • Test Configuration GUI: uv run python bmlibrarian_config_gui.py
  • Test agent demos: uv run python examples/agent_demo.py
  • Test counterfactual analysis: uv run python examples/counterfactual_demo.py
  • Verify Ollama connection before LLM operations
  • Validate all citations reference real database documents
  • Check evidence strength assessments are appropriate
  • Verify counterfactual analysis generates meaningful research questions
  • Ensure agents use configured models from config.json
  • Test document processing without artificial limits

The "golden rules" of programming for BMLibrarian

  1. Never trust input from users, external data, network or file data: Always validate and sanitize input. Never trust that it will be in the expected format or contain the expected data.
  2. No magic numbers: Always use constants or configuration for numbers. Never hardcode numbers. Always use named constants for numbers that are used in multiple places.
  3. No hardcoded paths: Always use constants or configuration for paths. Never hardcode paths. Always use named constants for paths that are used in multiple places.
  4. All model communication happens through the python ollama library: Never use raw HTTP requests to communicate with Ollama. Always use the ollama library.
  5. All postgres database communication happens through the database manager: Never use psycopg connection directly or modify the database structure/schema without proper migration.
  6. All parameters must have type hints: No exceptions.
  7. All functions, methods, and classes must have docstrings: No exceptions.
  8. All errors must be handled, logged, and reported to the user: No exceptions.
  9. No inline style sheets: All stylesheets must be generated by the stylesheet generator / centralised styling system (stylesheet_generator.py).
  10. No hardcoded pixel values: All dimensions must be calculated from font metrics or relative to other elements, generally using our dpi font scaling system (dpi_scale.py).
  11. We prefer reusable pure functions over more complex larger structures. Where possible, such pure functions should be factored out into generally useful libraries.
  12. All modules need to be documented in markdown format in doc/users for the end user, and doc/developers for developers. Important information for the AI assistant goes into doc/llm.
  13. All database migrations MUST be idempotent: Use CREATE TABLE IF NOT EXISTS, CREATE INDEX IF NOT EXISTS, DO $$ ... IF NOT EXISTS ... END $$ blocks for ALTER TABLE. Migrations must be safe to run multiple times without errors or data loss.
  14. Migration files must NOT contain their own tracking code: The MigrationManager handles migration tracking via the bmlibrarian_migrations table. Never create or insert into public.schema_migrations or similar tables inside migration files.
  15. NEVER modify production database without explicit permission: Use the development database for testing migrations. Production changes require team approval.
  16. Use get_document_details() for fetching document metadata: When loading documents in widgets/components, always use the canonical get_document_details(document_id) function from bmlibrarian.database. This ensures consistent field names, pre-formatted authors, and proper PMID extraction across all UI components. Never write inline SQL for document fetching in UI code.