Skip to content

Latest commit

 

History

History
172 lines (133 loc) · 7.49 KB

File metadata and controls

172 lines (133 loc) · 7.49 KB

Agent Instructions - pgmpy

This file provides guidance for all AI agents when working with code in this repository.

Project Overview

pgmpy is a Python library for Causal AI using DAGs, Bayesian Networks, and related models. It provides tools for causal discovery, parameter estimation, inference, causal identification, and causal effect estimation.

Development Setup

pip install -e .[tests]        # bash
pip install -e ".[tests]"      # zsh

For PyTorch/GPU support: pip install -e .[torch,tests]

Common Commands

pgmpy uses pytest for testing and pre-commit for code quality. Here are some common commands:

# Run all tests
pytest -v pgmpy

# Run a specific test file
pytest -v pgmpy/tests/test_models/test_DiscreteBayesianNetwork.py

# Run a specific test class or method
pytest -v pgmpy/tests/test_models/test_DiscreteBayesianNetwork.py::TestBayesianModelCreation::test_add_cpds

# Linting (pre-commit runs black, flake8, isort, blackdoc)
pre-commit run --all-files

Prefer the smallest relevant pytest target first, then broaden to larger suites only if needed.

Instructions for Tasks

  1. Always make a plan before coding.
  2. For behavior changes and bug fixes, prefer writing tests first (Test-Driven Development) when practical. For pure documentation or non-behavioral refactors, add tests only if behavior is affected. Avoid adding too many tests and try to combine tests when possible.
  3. Follow existing patterns in the codebase. Always check for similar implementations before creating new ones.
  4. Use type hints and docstrings for new or modified public methods.
  5. Run the smallest relevant pytest target after changes to ensure correctness. Broaden to larger suites as needed. Run pre-commit when it is available in the environment.
  6. Avoid redundant checks in the code. For example, if a variable is always expected to be a list, do not add checks to verify that it is a list. Try to avoid adding try except blocks unless absolutely necessary. If you need to add error handling, make sure to be explicit about the expected exceptions and handle them appropriately.
  7. Check if the method that you are using has a deprecation warning. If it does, try to use the recommended alternative instead of the deprecated method.
  8. When making changes to codebase, aim for modularity and reusability.
  9. When adding new functionality, consider how it can be integrated with existing components and whether it can be implemented in a way that allows for future extensions or modifications without requiring significant changes to the existing codebase.
  10. Preserve backwards compatibility unless the user explicitly requests or approves a breaking change. For migrations and refactors, prefer adding new APIs alongside compatibility shims before removing old paths.
  11. If a required command or dependency is unavailable in the environment, state that explicitly and use the best available validation instead of silently skipping verification.
  12. For code that supports multiple backends or optional dependencies, preserve existing numpy and torch behavior where applicable, and guard optional-dependency tests appropriately.

Code Style

  • Formatter: black
  • Import sorting: isort
  • Linter: flake8 + ruff (line length: 120)
  • Docstring formatting: blackdoc
  • Pre-commit hooks auto-run on commit after pre-commit install

Standard Workflow

# 1. Create feature or a bugfix branch (human workflow; optional for agents
# already working in an assigned branch/worktree)
git checkout -b feature/your-feature dev

# 2. Make changes with tests
# ... edit files ...

# 3. Verify
pre-commit run --all-files
pytest pgmpy

Architecture

Base Graph Classes (pgmpy/base/)

All models build on NetworkX graph classes. Key bases: DAG, PDAG, MAG, ADMG, UndirectedGraph. These support node role annotations (exposures, outcomes, confounders, mediators, etc.).

Models (pgmpy/models/)

Representation of models:

  • DiscreteBayesianNetwork — standard BN with TabularCPD factors
  • LinearGaussianBayesianNetwork — continuous Gaussian with LinearGaussianCPD
  • FunctionalBayesianNetwork — arbitrary distributions via FunctionalCPD (requires torch/pyro)
  • DynamicBayesianNetwork — time-series BNs
  • DiscreteMarkovNetwork — undirected graphical models
  • JunctionTree, FactorGraph, ClusterGraph — inference data structures
  • SEM — structural equation models
  • NaiveBayes - Special case of BN.

Factors (pgmpy/factors/)

Probability representations attached to models:

  • discrete/DiscreteFactor, TabularCPD, NoisyORCPD, JointProbabilityDistribution
  • continuous/LinearGaussianCPD
  • hybrid/FunctionalCPD (Pyro distribution-based)

Estimators (pgmpy/estimators/)

Estimators for learning model parameters from data:

  • Parameter learning: legacy compatibility layer for MaximumLikelihoodEstimator, BayesianEstimator, ExpectationMaximization
  • Structure learning: PC, HillClimbSearch, ExhaustiveSearch, GES, MmhcEstimator, TreeSearch
  • Scoring: K2, BDeu, BDs, BIC, AIC (and Gaussian, and conditional gaussian variants)

Parameter Estimation (pgmpy/parameter_estimator/)

Canonical implementation location for parameter learning from data:

  • Discrete parameter learning: MaximumLikelihoodEstimator, BayesianEstimator, ExpectationMaximization

Structure Scores (pgmpy/structure_score/)

Canonical implementation location for structure scoring utilities. Prefer this over importing structure scores from pgmpy.estimators.

Inference (pgmpy/inference/)

Computing posterior distributions given model and evidence:

  • Exact: VariableElimination, BeliefPropagation
  • Approximate: ApproxInference, GibbsSampling
  • Causal: CausalInference (do-calculus)
  • Dynamic: DBNInference

Causal Discovery (pgmpy/causal_discovery/)

Learning causal structure from data:

  • Constraint-based: PC, FCI
  • Score-based: GES, MMHC

Causal Identification (pgmpy/causal_identification/)

Identification of causal effects from a given causal graph:

  • Generalized adjustment sets: Adjustment
  • Frontdoor criterion: Frontdoor

Causal Effect Estimation (pgmpy/causal_estimation/)

Estimating causal effects from data given a causal graph:

  • Adjustment regression: NaiveAdjustmentRegressor
  • Doubly robust: DoublyMLRegressor
  • IV estimation: NaiveIVRegressor

Example Datasets (pgmpy/datasets/)

Set of example datasets. list_datasets method provides all available datasets.

Example models (pgmpy/example_models/)

Set of example models. list_models method provides all available models.

Metrics (pgmpy/metrics/)

Evaluation metrics for causal discovery and models against given dataset.

  • Model evaluation: CorrelationScore, FisherC, ImpliedCIs, SHD, StructureScore.

Backend System (pgmpy/global_vars.py)

Global config object switches between numpy and torch backends:

from pgmpy.global_vars import config

config.set_backend("torch", device="cuda")

Also provides functionality to specify configuration options on a global level.

I/O (pgmpy/readwrite/)

Use load and save functions in DiscreteBayesianNetwork and LinearGaussianBayesianNetwork for model serialization. Many formats are supported.

Tests (pgmpy/tests/)

Mirror the main package structure. Every subpackage and module has a corresponding test_ directory, and test_ file.