Agent Instructions - pgmpy

This file provides guidance for all AI agents when working with code in this repository.

Project Overview

pgmpy is a Python library for Causal AI using DAGs, Bayesian Networks, and related models. It provides tools for causal discovery, parameter estimation, inference, causal identification, and causal effect estimation.

Development Setup

pip install -e .[tests]        # bash
pip install -e ".[tests]"      # zsh

For PyTorch/GPU support: pip install -e .[torch,tests]

Common Commands

pgmpy uses pytest for testing and pre-commit for code quality. Here are some common commands:

# Run all tests
pytest -v pgmpy

# Run a specific test file
pytest -v pgmpy/tests/test_models/test_DiscreteBayesianNetwork.py

# Run a specific test class or method
pytest -v pgmpy/tests/test_models/test_DiscreteBayesianNetwork.py::TestBayesianModelCreation::test_add_cpds

# Linting (pre-commit runs black, flake8, isort, blackdoc)
pre-commit run --all-files

Prefer the smallest relevant pytest target first, then broaden to larger suites only if needed.

Instructions for Tasks

Always make a plan before coding.
For behavior changes and bug fixes, prefer writing tests first (Test-Driven Development) when practical. For pure documentation or non-behavioral refactors, add tests only if behavior is affected. Avoid adding too many tests and try to combine tests when possible.
Follow existing patterns in the codebase. Always check for similar implementations before creating new ones.
Use type hints and docstrings for new or modified public methods.
Run the smallest relevant pytest target after changes to ensure correctness. Broaden to larger suites as needed. Run pre-commit when it is available in the environment.
Avoid redundant checks in the code. For example, if a variable is always expected to be a list, do not add checks to verify that it is a list. Try to avoid adding try except blocks unless absolutely necessary. If you need to add error handling, make sure to be explicit about the expected exceptions and handle them appropriately.
Check if the method that you are using has a deprecation warning. If it does, try to use the recommended alternative instead of the deprecated method.
When making changes to codebase, aim for modularity and reusability.
When adding new functionality, consider how it can be integrated with existing components and whether it can be implemented in a way that allows for future extensions or modifications without requiring significant changes to the existing codebase.
Preserve backwards compatibility unless the user explicitly requests or approves a breaking change. For migrations and refactors, prefer adding new APIs alongside compatibility shims before removing old paths.
If a required command or dependency is unavailable in the environment, state that explicitly and use the best available validation instead of silently skipping verification.
For code that supports multiple backends or optional dependencies, preserve existing numpy and torch behavior where applicable, and guard optional-dependency tests appropriately.

Code Style

Formatter: black
Import sorting: isort
Linter: flake8 + ruff (line length: 120)
Docstring formatting: blackdoc
Pre-commit hooks auto-run on commit after pre-commit install

Standard Workflow

# 1. Create feature or a bugfix branch (human workflow; optional for agents
# already working in an assigned branch/worktree)
git checkout -b feature/your-feature dev

# 2. Make changes with tests
# ... edit files ...

# 3. Verify
pre-commit run --all-files
pytest pgmpy

Architecture

Base Graph Classes (`pgmpy/base/`)

All models build on NetworkX graph classes. Key bases: DAG, PDAG, MAG, ADMG, UndirectedGraph. These support node role annotations (exposures, outcomes, confounders, mediators, etc.).

Models (`pgmpy/models/`)

Representation of models:

DiscreteBayesianNetwork — standard BN with TabularCPD factors
LinearGaussianBayesianNetwork — continuous Gaussian with LinearGaussianCPD
FunctionalBayesianNetwork — arbitrary distributions via FunctionalCPD (requires torch/pyro)
DynamicBayesianNetwork — time-series BNs
DiscreteMarkovNetwork — undirected graphical models
JunctionTree, FactorGraph, ClusterGraph — inference data structures
SEM — structural equation models
NaiveBayes - Special case of BN.

Factors (`pgmpy/factors/`)

Probability representations attached to models:

discrete/ — DiscreteFactor, TabularCPD, NoisyORCPD, JointProbabilityDistribution
continuous/ — LinearGaussianCPD
hybrid/ — FunctionalCPD (Pyro distribution-based)

Estimators (`pgmpy/estimators/`)

Estimators for learning model parameters from data:

Parameter learning: legacy compatibility layer for MaximumLikelihoodEstimator, BayesianEstimator, ExpectationMaximization
Structure learning: PC, HillClimbSearch, ExhaustiveSearch, GES, MmhcEstimator, TreeSearch
Scoring: K2, BDeu, BDs, BIC, AIC (and Gaussian, and conditional gaussian variants)

Parameter Estimation (`pgmpy/parameter_estimator/`)

Canonical implementation location for parameter learning from data:

Discrete parameter learning: MaximumLikelihoodEstimator, BayesianEstimator, ExpectationMaximization

Structure Scores (`pgmpy/structure_score/`)

Canonical implementation location for structure scoring utilities. Prefer this over importing structure scores from pgmpy.estimators.

Inference (`pgmpy/inference/`)

Computing posterior distributions given model and evidence:

Exact: VariableElimination, BeliefPropagation
Approximate: ApproxInference, GibbsSampling
Causal: CausalInference (do-calculus)
Dynamic: DBNInference

Causal Discovery (`pgmpy/causal_discovery/`)

Learning causal structure from data:

Constraint-based: PC, FCI
Score-based: GES, MMHC

Causal Identification (`pgmpy/causal_identification/`)

Identification of causal effects from a given causal graph:

Generalized adjustment sets: Adjustment
Frontdoor criterion: Frontdoor

Causal Effect Estimation (`pgmpy/causal_estimation/`)

Estimating causal effects from data given a causal graph:

Adjustment regression: NaiveAdjustmentRegressor
Doubly robust: DoublyMLRegressor
IV estimation: NaiveIVRegressor

Example Datasets (`pgmpy/datasets/`)

Set of example datasets. list_datasets method provides all available datasets.

Example models (`pgmpy/example_models/`)

Set of example models. list_models method provides all available models.

Metrics (`pgmpy/metrics/`)

Evaluation metrics for causal discovery and models against given dataset.

Model evaluation: CorrelationScore, FisherC, ImpliedCIs, SHD, StructureScore.

Backend System (`pgmpy/global_vars.py`)

Global config object switches between numpy and torch backends:

from pgmpy.global_vars import config

config.set_backend("torch", device="cuda")

Also provides functionality to specify configuration options on a global level.

I/O (`pgmpy/readwrite/`)

Use load and save functions in DiscreteBayesianNetwork and LinearGaussianBayesianNetwork for model serialization. Many formats are supported.

Tests (`pgmpy/tests/`)

Mirror the main package structure. Every subpackage and module has a corresponding test_ directory, and test_ file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Instructions - pgmpy

Project Overview

Development Setup

Common Commands

Instructions for Tasks

Code Style

Standard Workflow

Architecture

Base Graph Classes (`pgmpy/base/`)

Models (`pgmpy/models/`)

Factors (`pgmpy/factors/`)

Estimators (`pgmpy/estimators/`)

Parameter Estimation (`pgmpy/parameter_estimator/`)

Structure Scores (`pgmpy/structure_score/`)

Inference (`pgmpy/inference/`)

Causal Discovery (`pgmpy/causal_discovery/`)

Causal Identification (`pgmpy/causal_identification/`)

Causal Effect Estimation (`pgmpy/causal_estimation/`)

Example Datasets (`pgmpy/datasets/`)

Example models (`pgmpy/example_models/`)

Metrics (`pgmpy/metrics/`)

Backend System (`pgmpy/global_vars.py`)

I/O (`pgmpy/readwrite/`)

Tests (`pgmpy/tests/`)

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Agent Instructions - pgmpy

Project Overview

Development Setup

Common Commands

Instructions for Tasks

Code Style

Standard Workflow

Architecture

Base Graph Classes (pgmpy/base/)

Models (pgmpy/models/)

Factors (pgmpy/factors/)

Estimators (pgmpy/estimators/)

Parameter Estimation (pgmpy/parameter_estimator/)

Structure Scores (pgmpy/structure_score/)

Inference (pgmpy/inference/)

Causal Discovery (pgmpy/causal_discovery/)

Causal Identification (pgmpy/causal_identification/)

Causal Effect Estimation (pgmpy/causal_estimation/)

Example Datasets (pgmpy/datasets/)

Example models (pgmpy/example_models/)

Metrics (pgmpy/metrics/)

Backend System (pgmpy/global_vars.py)

I/O (pgmpy/readwrite/)

Tests (pgmpy/tests/)

Base Graph Classes (`pgmpy/base/`)

Models (`pgmpy/models/`)

Factors (`pgmpy/factors/`)

Estimators (`pgmpy/estimators/`)

Parameter Estimation (`pgmpy/parameter_estimator/`)

Structure Scores (`pgmpy/structure_score/`)

Inference (`pgmpy/inference/`)

Causal Discovery (`pgmpy/causal_discovery/`)

Causal Identification (`pgmpy/causal_identification/`)

Causal Effect Estimation (`pgmpy/causal_estimation/`)

Example Datasets (`pgmpy/datasets/`)

Example models (`pgmpy/example_models/`)

Metrics (`pgmpy/metrics/`)

Backend System (`pgmpy/global_vars.py`)

I/O (`pgmpy/readwrite/`)

Tests (`pgmpy/tests/`)