You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
StatsPAI: The Agent-Native Causal Inference & Econometrics Toolkit for Python
StatsPAI is the agent-native Python package for causal inference and applied econometrics. One import, 390+ functions, covering the complete empirical research workflow — from classical econometrics to cutting-edge ML/AI causal methods to publication-ready tables in Word, Excel, and LaTeX.
Designed for AI agents: every function returns structured result objects with self-describing schemas (list_functions(), describe_function(), function_schema()), making StatsPAI the first econometrics toolkit purpose-built for LLM-driven research workflows — while remaining fully ergonomic for human researchers.
It brings R's Causal Inference Task View (fixest, did, rdrobust, gsynth, DoubleML, MatchIt, CausalImpact, ...) and Stata's core econometrics commands into a single, consistent Python API.
NEW in v0.6: sp.interactive(fig) — a Stata Graph Editor-style WYSIWYG plot editor for Jupyter, with 29 academic themes, real-time preview, and auto-generated reproducible code.
Built by the team behind CoPaper.AI · Stanford REAP Program
Why StatsPAI?
Pain point
Stata
R
StatsPAI
Scattered packages
One environment, but $695+/yr license
20+ packages with incompatible APIs
One import, unified API
Publication tables
outreg2 (limited formats)
modelsummary (best-in-class)
Word + Excel + LaTeX + HTML in every function
Robustness checks
Manual re-runs
Manual re-runs
spec_curve() + robustness_report() — one call
Heterogeneity analysis
Manual subgroup splits + forest plots
Manual lapply + ggplot
subgroup_analysis() with Wald test
Modern ML causal
Limited (no DML, no causal forest)
Fragmented (DoubleML, grf, SuperLearner separate)
DML, Causal Forest, Meta-Learners, TMLE, DeepIV
Neural causal models
None
None
TARNet, CFRNet, DragonNet
Causal discovery
None
pcalg (complex API)
notears(), pc_algorithm()
Policy learning
None
policytree (standalone)
policy_tree() + policy_value()
Result objects
Inconsistent across commands
Inconsistent across packages
Unified CausalResult with .summary(), .plot(), .to_latex(), .cite()
Interactive plot editing
Graph Editor (no code export)
None
sp.interactive() — GUI editing with auto-generated code
WYSIWYG plot editor with 29 themes & auto code generation
Jupyter ipywidgets
Every result object has:
result.summary() # Formatted text summaryresult.plot() # Appropriate visualizationresult.to_latex() # LaTeX tableresult.to_docx() # Word documentresult.cite() # BibTeX citation for the method
Interactive Plot Editor — Python's Answer to Stata Graph Editor
Stata users know the Graph Editor: double-click a figure to enter a WYSIWYG editing interface — drag fonts, change colors, adjust layout. This has been a Stata-exclusive experience. In Python, matplotlib produces static images — changing a title font size means editing code and re-running.
sp.interactive(fig) turns any matplotlib figure into a live editing panel — figure preview on the left, property controls on the right, just like Stata's Graph Editor. But it does two things Stata can't:
29 academic themes, one-click switching. From AER journal style to ggplot, FiveThirtyEight, dark presentation mode — select and see the result instantly. Stata's scheme requires regenerating the plot; here it's real-time.
Every edit auto-generates reproducible Python code. Adjust title size, change colors, add annotations in the GUI — the editor records each operation as standard matplotlib code (ax.set_title(...), ax.spines[...].set_visible(...)). Copy with one click, paste into your script, and it reproduces exactly. Stata's Graph Editor cannot export edits to do-file commands.
Auto/Manual rendering modes: Auto refreshes the preview on every change; Manual batches edits for a single Apply — useful for large figures or slow machines.
importstatspaiasspresult=sp.did(df, y='wage', treat='policy', time='year')
fig, ax=result.plot()
editor=sp.interactive(fig) # opens the editor# After editing in the GUI:editor.copy_code() # prints reproducible Python code
Utilities
Function
Description
Stata equivalent
label_var(), label_vars()
Variable labeling
label var
describe()
Data description
describe
pwcorr()
Pairwise correlation with significance stars
pwcorr, star(.05)
winsor()
Winsorization
winsor2
read_data()
Multi-format data reader
use / import
Installation
pip install statspai
With optional dependencies:
pip install statspai[plotting] # matplotlib, seaborn
pip install statspai[fixest] # pyfixest for high-dimensional FE
One package, one import, consistent .summary() / .plot() / .to_latex() across all methods. Stata requires paid add-ons; R requires 20+ packages with different interfaces.
Modern ML causal methods
DML, Causal Forest, Meta-Learners (S/T/X/R/DR), TMLE, DeepIV, TARNet/CFRNet/DragonNet, Policy Trees — all in one place. Stata has almost none of these. R has them scattered across incompatible packages.
Robustness automation
spec_curve(), robustness_report(), subgroup_analysis() — no manual re-running. Neither Stata nor R offers this out-of-the-box.
Free & open source
MIT license, $0. Stata costs $695–$1,595/year.
Python ecosystem
Integrates naturally with pandas, scikit-learn, PyTorch, Jupyter, cloud pipelines.
Auto-citations
Every causal method has .cite() returning the correct BibTeX. Neither Stata nor R does this.
Interactive Plot Editor
sp.interactive() — Stata Graph Editor-style GUI in Jupyter with 29 themes and auto-generated reproducible code. Stata's Graph Editor can't export edits to do-file; R has no equivalent.
Where Stata still wins
Advantage
Detail
Battle-tested at scale
40+ years of production use in economics. Edge cases are well-handled.
Speed on very large datasets
Stata's compiled C backend is faster for simple OLS/FE on datasets with millions of rows.
Survey data & complex designs
svy: prefix, stratification, clustering — Stata's survey support is unmatched.
Mature documentation
Every command has a PDF manual with worked examples. Community is massive.
Journal acceptance
Referees in some fields trust Stata output by default.
Where R still wins
Advantage
Detail
Cutting-edge methods
New econometric methods (e.g., fixest, did2s, HonestDiD) often appear in R first.
ggplot2 visualization
R's grammar of graphics is more flexible than matplotlib for complex figures.
modelsummary
R's modelsummary is the gold standard for regression tables — StatsPAI's is close but not yet identical.
CRAN quality control
R packages go through peer review. Python packages vary in quality.
Spatial econometrics
spdep, spatialreg — R has a deeper spatial ecosystem.
Theme switching fix: Themes now fully reset rcParams before applying, so switching between themes (e.g. ggplot → academic) correctly updates all visual properties
Apply button fix: Fixed being clipped on the Layout tab; now pinned to panel bottom
Error visibility: Widget callback errors now surface in the status bar instead of being silently swallowed
Auto mode: Always refreshes preview when toggled for immediate feedback
Theme tab: Moved to first position; color pickers show confirmation feedback
Code generation: Auto-generate reproducible code with text selection support
Panel: Major expansion of panel() — Hausman test, Breusch-Pagan LM, Pesaran CD, Wooldridge autocorrelation, panel unit root tests; added panel_summary_plot(), fe_plot(), re_comparison_plot()
RD: New rd_diagnostics() suite — bandwidth sensitivity, placebo cutoffs, donut-hole robustness, covariate balance at cutoff, density test
IV / 2SLS: Rewritten ivreg() with proper first-stage diagnostics (Cragg-Donald, Kleibergen-Paap), weak IV detection, Sargan-Hansen overidentification test, Anderson canonical correlation test, Stock-Yogo critical values
Matching: Enhanced match() — added CEM (Coarsened Exact Matching), optimal matching, genetic matching; improved balance diagnostics with Love plot and standardized mean difference
DAG: Expanded dag() with 15+ built-in example DAGs (dag_example()), dag_simulate() for data generation from causal graphs, backdoor/frontdoor criterion identification
Causal Impact: Enhanced Bayesian structural time-series with automatic model selection and improved inference
AI Agent Registry: Expanded list_functions(), describe_function(), function_schema(), search_functions() for LLM/agent tool-use integration
Unified CausalResult object with .summary(), .plot(), .to_latex(), .to_docx(), .cite()
About
StatsPAI Inc. is the research infrastructure company behind CoPaper.AI — the AI co-authoring platform for empirical research, born out of Stanford's REAP program.
CoPaper.AI — Upload your data, set your research question, and produce a fully reproducible academic paper with code, tables, and formatted output. Powered by StatsPAI under the hood. copaper.ai