Skip to content

interp1d causing errors while running train_DESI.py #56

@Nikhil0504

Description

@Nikhil0504

Hello Maintainers,

I am trying to train Spender with DESI DR1 data. But, upon running this command: python train/train_DESI.py ./DATA/ outfile -b 64 I get the following errors:

[...]
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torchinterp1d/interp1d.py", line 155, in backward
    gradients = torch.autograd.grad(
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 394, in grad
    result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torchinterp1d/interp1d.py", line 155, in backward
    gradients = torch.autograd.grad(
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 394, in grad
    result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torchinterp1d/interp1d.py", line 155, in backward
    gradients = torch.autograd.grad(
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 394, in grad
    result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torchinterp1d/interp1d.py", line 155, in backward
    gradients = torch.autograd.grad(
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 394, in grad
    result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torchinterp1d/interp1d.py", line 155, in backward
    gradients = torch.autograd.grad(
  File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 394, in grad
    result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: thread constructor failed: Resource temporarily unavailable

I added torch.autograd.set_detect_anomaly(True) to train_DESI.py and got some more traceback on where the issue is coming from:

/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.13/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in Interp1dBackward. 
Traceback of forward call that caused the error: 
File "/Users/ng27753/Astronomy_Research/spender/train/train_DESI.py", line 463, in <module> train(models, instruments, trainloaders, validloaders, n_epoch=n_epoch, 
File "/Users/ng27753/Astronomy_Research/spender/train/train_DESI.py", line 309, in train losses = get_losses( File "/Users/ng27753/Astronomy_Research/spender/train/train_DESI.py", line 176, in get_losses loss, sim_loss, s = _losses(model, instrument, batch, similarity=similarity, slope=slope) 
File "/Users/ng27753/Astronomy_Research/spender/train/train_DESI.py", line 159, in _losses loss = model.loss(spec, w, instrument, z=z, s=s) 
File "/Users/ng27753/Astronomy_Research/spender/spender/model.py", line 557, in loss s, x, y_, valid = self._forward(y, instrument=instrument, z=z, s=s, normalize=normalize) 
File "/Users/ng27753/Astronomy_Research/spender/spender/model.py", line 489, in _forward reconstruction, valid = self.decoder.transform(restframe, instrument=instrument, z=z, return_valid=True) 
File "/Users/ng27753/Astronomy_Research/spender/spender/model.py", line 344, in transform spectrum = interp1d(wave_redshifted, x, wave_obs) 
File "/Users/ng27753/Astronomy_Research/spender/.venv/lib/python3.13/site-packages/torch/autograd/function.py", line 581, in apply return super().apply(*args, **kwargs) # type: ignore[misc] (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:127.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 

Here is my pip freeze as well:
❯ pip freeze
absl-py==2.3.1
accelerate==1.10.1
astropy==6.0.1
astropy-iers-data==0.2025.10.20.0.39.8
certifi==2025.10.5
charset-normalizer==3.4.4
contourpy==1.3.0
cycler==0.12.1
filelock==3.19.1
fonttools==4.60.1
fsspec==2025.9.0
GPUtil==1.4.0
grpcio==1.76.0
h5py==3.14.0
hf-xet==1.2.0
huggingface-hub==0.36.0
humanize==4.13.0
idna==3.11
importlib_metadata==8.7.0
importlib_resources==6.5.2
Jinja2==3.1.6
kiwisolver==1.4.7
Markdown==3.9
MarkupSafe==3.0.3
matplotlib==3.9.4
mpmath==1.3.0
networkx==3.2.1
nflows==0.14
numpy==1.26.4
packaging==25.0
pillow==11.3.0
protobuf==6.33.0
psutil==7.1.1
pyerfa==2.0.1.5
pyparsing==3.2.5
python-dateutil==2.9.0.post0
PyYAML==6.0.3
requests==2.32.5
safetensors==0.6.2
six==1.17.0
spender==0.2.8
sympy==1.14.0
tensorboard==2.20.0
tensorboard-data-server==0.7.2
torch==2.1.0
torchinterp1d==1.1
tqdm==4.67.1
typing_extensions==4.15.0
urllib3==2.5.0
Werkzeug==3.1.3
zipp==3.23.0

The problem seems to be occurring here but I am unable to find a solution to fix it:

spender/spender/model.py

Lines 344 to 361 in 21df63e

spectrum = interp1d(wave_redshifted, x, wave_obs)
# need to zero out parts of the spectrum that our outside of the restframe range (see #34)
valid = wave_obs[None,:] > self.wave_rest[0] * (1 + z[:,None])
valid &= wave_obs[None,:] < self.wave_rest[-1] * (1 + z[:,None])
spectrum[~valid] = 0
# convolve with LSF
if instrument.lsf is not None:
spectrum = instrument.lsf(spectrum.unsqueeze(1)).squeeze(1)
# apply calibration function to observed spectrum
if instrument is not None and instrument.calibration is not None:
spectrum = instrument.calibration(wave_obs, spectrum)
if return_valid:
return spectrum, valid
return spectrum

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions