Hi!
I'm trying to use UltraNest with a very expensive likelihood whose evaluation at a single point of the parameter space needs to be parallelized using MPI.
The likelihood is automatically vectorized.
However, if I'm not mistaken, UltraNest uses MPI to parallelize some internal computation/live point proposal. This conflict causes a bug in my program.
Here's a dummy code that reproduces a heavy likelihood parallelized using MPI:
import time
import numpy as np
from mpi4py import MPI
import emcee
import ultranest
sampler = 'emcee'
class dummy_class():
def __init__(self, data, comm):
self.data = data
self.comm = comm
self.rank = comm.Get_rank()
self.size = comm.Get_size()
def _compute_sub(self, theta, local_idx):
# mimic an heavy likelihood that scales linearly with the number of data points
time.sleep(len(self.data[local_idx])/5000)
# like is vectorized over theta
res = np.sum(self.data[local_idx])/np.exp(theta**2)
return res
def log_like(self, theta):
nparams = np.atleast_1d(theta).shape[0] # number of multiple values of theta to compute
# find indexes for each rank
chunk_size = len(self.data) // self.size
remainder = len(self.data) % self.size
start_idx = self.rank * chunk_size + min(self.rank, remainder)
end_idx = start_idx + chunk_size + (1 if self.rank < remainder else 0)
local_idx = np.arange(start_idx, end_idx)
# compute
local_res = self._compute_sub(theta, local_idx)
res = self.comm.allreduce(local_res, op=MPI.SUM)
return np.log(np.abs(res))
I was able to use this likelihood with emcee in the following way:
if __name__ == "__main__" and sampler=='emcee':
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
np.random.seed(123)
data = np.random.rand(10000)/500
dummy_obj = dummy_class(data, comm)
theta_true = 0.
theta_bounds = [-10.,10.]
def log_prior(theta):
# vectorized over theta
condition = np.logical_and(theta_bounds[0] <= theta, theta <= theta_bounds[1])
return np.where(condition,
0.0,
-np.inf)
def log_prior_fn(theta):
# all processes have the same `theta` of the root
if rank == 0:
theta = theta
else:
theta = None
theta = comm.bcast(theta, root = 0)
# vectorized over theta
lp = log_prior(theta)
# we compute the likelihood only where lp is finite
to_calc = np.isfinite(lp)
ll = np.where(to_calc,
dummy_obj.log_like(theta),
0.0)
return ll+lp
# Using emcee.
nwalkers = 10
nsteps = 100
ndim = 1
filename = "test_emcee.h5"
if rank == 0:
backend = emcee.backends.HDFBackend(filename)
backend.reset(nwalkers, ndim)
initial_state = np.random.normal(loc=theta_true, scale=1., size=(nwalkers,ndim))
progress = True
store=True,
else:
backend = None
initial_state = None
progress = False
store=False
initial_state = comm.bcast(initial_state, root=0)
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prior_fn, backend=backend, vectorize=True)
sampler.run_mcmc(initial_state,
nsteps,
store = store,
progress = progress)
The idea of this script is that each MPI process run an emcee sampler so that each of them calls the likelihood function. This is necessary, otherwise the non-root ranks never compute the likelihood on their chunk of data. However, only the root process stores the results and prints the progress. Also, inside the likelihood function the parameters used to compute the likelihood by each rank are forced to be those of the root for consistency between the various sampler.
This example works and there is effectively a large speed up in the code.
I can't produce anything similar with UltraNest. I tried with this code here:
if __name__ == "__main__" and sampler=='ultranest':
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
np.random.seed(123)
data = np.random.rand(10000)/500
dummy_obj = dummy_class(data, comm) # each rank
theta_true = 0.
theta_bounds = [-10.,10.]
param_names = ['theta']
def prior_transform(cube):
width = theta_bounds[1] - theta_bounds[0]
return cube * width + theta_bounds[0]
def log_prior_fn(theta):
# all processes have the same `theta` of the root
if rank == 0:
theta = theta
else:
theta = None
theta = comm.bcast(theta, root = 0)
res = dummy_obj.log_like(theta)
return res.T[0] # ultranest trick to match shapes
# no store and progress options as in emcee, could be a problem
sampler = ultranest.ReactiveNestedSampler(param_names,
log_prior_fn,
prior_transform,
log_dir="ultranest",
vectorized=True)
# try to disable ultranest MPI parallelization
sampler.use_mpi=False
sampler.mpi_size=1
sampler.mpi_rank=0
# each run the sampler
result = sampler.run()
sampler.print_results()
The program runs when started with a single MPI process (mpirun -np 1 python test.py), but fails for np > 1 with error:
[ultranest] Sampling 400 live points from prior ...
^^^^^^^^^^^^^
File "/home/mt/softwares/miniconda3/envs/chimeragw-stable/lib/python3.11/site-packages/ultranest/integrator.py", line 2560, in run_iter
for _result in self.run_iter(
File "/home/mt/softwares/miniconda3/envs/chimeragw-stable/lib/python3.11/site-packages/ultranest/integrator.py", line 1435, in _widen_roots_beyond_initial_plateau
File "/home/mt/softwares/miniconda3/envs/chimeragw-stable/lib/python3.11/site-packages/ultranest/integrator.py", line 1524, in _widen_roots
AssertionError
assert num_live_points_missing >= 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Does anyone know a workaround for this problem?
Thanks!
Hi!
I'm trying to use
UltraNestwith a very expensive likelihood whose evaluation at a single point of the parameter space needs to be parallelized using MPI.The likelihood is automatically vectorized.
However, if I'm not mistaken,
UltraNestuses MPI to parallelize some internal computation/live point proposal. This conflict causes a bug in my program.Here's a dummy code that reproduces a heavy likelihood parallelized using MPI:
I was able to use this likelihood with
emceein the following way:The idea of this script is that each MPI process run an
emceesampler so that each of them calls the likelihood function. This is necessary, otherwise the non-root ranks never compute the likelihood on their chunk of data. However, only the root process stores the results and prints the progress. Also, inside the likelihood function the parameters used to compute the likelihood by each rank are forced to be those of the root for consistency between the various sampler.This example works and there is effectively a large speed up in the code.
I can't produce anything similar with
UltraNest. I tried with this code here:The program runs when started with a single MPI process (
mpirun -np 1 python test.py), but fails fornp > 1with error:Does anyone know a workaround for this problem?
Thanks!