Issue:
The current configuration schema requires train_years, valid_years, and test_years to be specified as lists representing a continuous range, e.g.:
data:
train_years: [1979, 2018]
valid_years: [2018, 2020]
where they are implemented in range() in credit/parser.py:
train_years = [str(year) for year in range(train_years_range[0], train_years_range[1])]
valid_years = [str(year) for year in range(valid_years_range[0], valid_years_range[1])]
This is also the case in credit/applications/rollout_les.py and credit/applications/rollout_wrf.py for test_years [identified by claude].
Request:
Some applications may require a temporal split that has discontinuous year segments, e.g. training on the CESM2-LE and evaluating performance on different climate states
Ideally, it would be nice to be able to implement something like:
data:
train_years: [[1960, 2039]]
valid_years: [[1950, 1954], [2045, 2049]]
test_years: [[1955, 1959], [2040, 2044]]
Proposed fix [from Claude]:
Support a list-of-lists syntax, with backward compatibility preserved for the existing two-element list format. The parser would detect the format and flatten accordingly:
def parse_year_ranges(year_spec):
"""Accept either [start, end] or [[s1, e1], [s2, e2], ...]"""
if isinstance(year_spec[0], list):
years = []
for seg in year_spec:
years.extend(range(seg[0], seg[1]))
return [str(y) for y in years]
else:
return [str(y) for y in range(year_spec[0], year_spec[1])]
This change would need to be applied consistently across:
- credit/parser.py (training and validation)
- credit/applications/rollout_les.py (testing)
- credit/applications/rollout_wrf.py (testing)
- Any other scripts that consume *_years from conf
Related change:
Once discontinuous segments are supported, a natural extension would be to allow weighted validation loss across segments — e.g., if a user wants the model to perform well across two climate regimes but prioritizes one, they could weight the validation metric accordingly rather than evaluating segments equally. For now, a pooled evaluation across all segments is sufficient.
Issue:
The current configuration schema requires train_years, valid_years, and test_years to be specified as lists representing a continuous range, e.g.:
where they are implemented in range() in credit/parser.py:
This is also the case in credit/applications/rollout_les.py and credit/applications/rollout_wrf.py for test_years [identified by claude].
Request:
Some applications may require a temporal split that has discontinuous year segments, e.g. training on the CESM2-LE and evaluating performance on different climate states
Ideally, it would be nice to be able to implement something like:
Proposed fix [from Claude]:
Support a list-of-lists syntax, with backward compatibility preserved for the existing two-element list format. The parser would detect the format and flatten accordingly:
This change would need to be applied consistently across:
Related change:
Once discontinuous segments are supported, a natural extension would be to allow weighted validation loss across segments — e.g., if a user wants the model to perform well across two climate regimes but prioritizes one, they could weight the validation metric accordingly rather than evaluating segments equally. For now, a pooled evaluation across all segments is sufficient.