Skip to content

oshaughnessy-junior/simulation_manager_demo_hyperpipe_pop

Repository files navigation

hyperpipe_demo_sim

A self-contained toy demo showing the RIFT hyperpipeline and simulation manager working together for adaptive population inference. The problem is intentionally simple so the full loop can be validated without real physics.

What it demonstrates

Hyperpipeline (create_eos_posterior_pipeline): adaptive iterative inference over generic parameters. Each iteration evaluates a batch of candidate points, fits a surrogate posterior, and concentrates the next batch near high-probability regions — no hand-tuned grid required.

Simulation manager (GMMArchive): a persistent, content-addressed cache that sits between the hyperpipe and the likelihood computation. Every worker call checks the cache first; results survive across iterations and across separate runs. In a real application this cache would hold expensive stellar-evolution or radiation-transport outputs; here it holds cheap GMM likelihood evaluations, but the interface is identical.

The toy problem

50 two-dimensional "mass pair" observations are drawn from a fixed two-component GMM (seed 42, true params below). The pipeline recovers the seven population parameters of the model, including the event rate.

True model:

param value
w1 (weight of component 1) 0.6
mu1_x, mu1_y (log₁₀ mean 1) 1.398, 1.477 (25, 30 M☉)
mu2_x, mu2_y (log₁₀ mean 2) 1.845, 1.778 (70, 60 M☉)
log_sigma (log₁₀ σ in dex) −1.0 (σ = 0.1 dex scatter in log₁₀ mass)
log_R depends on VT model (see below)

The GMM operates in log₁₀-mass space (columns of observations.dat are log₁₀(m/M☉)). This ensures strictly positive masses and gives a lognormal distribution in physical mass space.

Event rate and selection function. The full Poisson likelihood is

lnL = − R·VT_eff + N·ln(R) + Σ_k ln p(m_k | θ)

where VT_eff = ∫ VT(m) p(m|θ) dm is the effective surveyed volume. Two selection models are available (set VT_MODEL in the Makefile):

VT_MODEL VT(m1,m2) R_true log10(R_true)
uniform 1 N = 50 ≈ 1.7
chirp_mass Mchirp^(15/6) N / VT_eff ≈ 10⁻³ ≈ −3

where Mchirp = (m1·m2)^(3/5) / (m1+m2)^(1/5). The chirp-mass scaling arises because GW detector range ∝ Mchirp^(5/6), so surveyed volume ∝ Mchirp^(15/6). create_data.py generates observations drawn from the VT-biased population and prints the true R.

For the chirp_mass model observations are weighted toward the high-mass component, so the pipeline should prefer it.

Known degeneracy: swapping components 1 ↔ 2 leaves the likelihood unchanged, so the posterior is bimodal. The pipeline converges to one mode; this is correct behaviour, not a failure.

Files

File Purpose
create_data.py Generate observations.dat and true_gmm.json
gmm_archive.py GMMArchive — persistent lnL cache (simulation manager)
gmm_worker.py Hyperpipe-compatible worker executable
Makefile Orchestrates all build steps
hyperpipe_conf.yaml YAML config for the util_RIFT_hyperpipe.py driver

How to run

Requires a Python environment with RIFT, NumPy, and SciPy (e.g. conda activate my_rift) and a condor submit node. See INSTALL.md for pixi quickstart instructions.

Path A — inline Makefile (default)

All pipeline configuration is written directly by shell commands in the Makefile; the initial grid is a separate Make target.

# 1. Generate synthetic observations (deterministic, seed=42)
make observations.dat

# 2. Generate initial random parameter grid
make initial_grid.dat

# 3. Smoke-test the worker locally — no condor needed
make test_worker

# 4. Build the condor DAG
make rundir

# 5. Submit
make submit

Path B — util_RIFT_hyperpipe.py + hyperpipe_conf.yaml

Set USE_HYPERPIPE=1 to delegate pipeline construction to the Hydra-based driver. It reads hyperpipe_conf.yaml, generates the initial grid internally, and assembles all args files. Dynamic paths (observations file, archive directory, VT model) are injected via environment variables resolved at runtime by OmegaConf.

USE_HYPERPIPE=1 make observations.dat test_worker   # smoke-test
USE_HYPERPIPE=1 make observations.dat rundir        # build DAG
make submit                                          # submit (same as Path A)

Or using pixi tasks:

pixi run smoke-test-hyperpipe   # smoke-test
pixi run build-dag              # observations + rundir
make submit

Shared knobs

make clean removes all generated files and directories (both paths). The gmm_archive/ cache persists across runs so repeated evaluations of the same points are free.

Key Makefile variables (top of file):

variable default meaning
N_SAMPLES_PER_JOB 1000 new grid points evaluated per iteration
N_ITERATIONS 5 number of adaptive iterations
NCHUNK 50 points per condor MARG job (Path A only)
EXPLODE_JOBS 3 parallel POST jobs per iteration
VT_MODEL uniform selection function (uniform or chirp_mass)
USE_HYPERPIPE 0 set to 1 to use Path B

Switching VT models: run make clean before changing VT_MODEL so that observations.dat and the gmm_archive/ cache are regenerated consistently with the new selection function.

Visualizing results

From rundir/ after the DAG completes:

cd rundir
plot_posterior_corner.py \
    --posterior-file grid-5.dat \
    --parameter w1 --parameter mu1_x --parameter mu1_y --parameter log_R \
    --composite-file all.marg_net \
    --composite-file-has-labels \
    --lnL-cut 15 \
    --use-all-composite-but-grayscale

grid-5.dat is the final posterior sample set (replace 5 with the actual last iteration). all.marg_net accumulates all likelihood evaluations. --lnL-cut 15 discards points more than 15 log-units below the peak; raise it if the posterior looks clipped.

Notes

Archive concurrency. Each condor job writes to its own content-addressed subdirectory (gmm_archive/<hash16>/), so concurrent writes never collide. index.jsonl may gain duplicate entries under races but per-hash result.json is authoritative.

OSG / shared filesystem. The current setup assumes all condor jobs share the submit-node filesystem. For OSG (no shared filesystem), the archive would need to be transferred per-job or backed by OSDF.

lnL scale. With 50 observations the optimum lnL is O(−380). The mcsamplerAdaptiveVolume sampler (--sampler-method AV) handles this scale correctly; the default mcsampler underflows.

About

Demo of using RIFT hyperpipe+simulation_manager to perform population inference with a low-dimensional GMM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors