A differentiable Julia port of the Community Land Model version 5 (CLM5/CTSM) — the land surface component of the Community Earth System Model (CESM).
CLM.jl reproduces the full CLM5 biogeophysics in pure Julia, enabling automatic differentiation for gradient-based parameter calibration, GPU-ready architecture with BitVector masks, and composability with the Julia scientific ecosystem (ForwardDiff.jl, Enzyme.jl, Flux.jl, Optim.jl).
This codebase was written almost entirely by AI agents running in a continuous "Ralph" loop. No human has hand-written any meaningful amount of the code here, and the author has read only a small fraction of it. What you see is the output of a large amount of agentic engineering — autonomous agents reading the original Fortran, porting it, writing tests, validating against reference output, and iterating — supervised at the goal level, not the line level.
This project exists to find out what can be done with agentic engineering on a serious scientific codebase — not to demonstrate what should be done. It is a probe into the limits of the approach, and should be read as such.
Implications you must take seriously:
- 🧪 It is a research artifact, not production software. Do not use it for science, operations, or decisions you care about without independent verification.
- 🔍 Validation is partial. The numbers below are real, but they cover a narrow slice (one site, SP mode, selected variables). Whole subsystems are implemented but unvalidated, and "tests pass" means the agents' own tests pass.
- 🐛 Expect latent bugs. Plausible-looking code that no human has reviewed can be subtly or badly wrong in ways tests don't catch. Treat every result as suspect until you've checked it yourself.
- 📝 Provenance is unusual. Much of the design rationale lives in agent logs and commit history rather than in a human's head.
Use entirely at your own risk. No warranty, no guarantees of correctness, fitness, or scientific validity. If you build on this, verify everything.
CLM.jl has been validated in SP mode against Fortran CLM5 (CTSM release-clm5.0) for a full-year simulation at the Bow at Banff site (51.2°N, 115.6°W, 2003, restart-initialized from a 2002 cold-start spinup). The figure below shows 7-day running averages for 12 key output variables:
| Variable | Mean Bias | RMSE | Correlation |
|---|---|---|---|
| 2-m Air Temperature | +0.01 K | 0.15 K | 0.999 |
| Absorbed Solar Radiation | +0.5% | 1.2 W/m² | 0.999 |
| Latent Heat Flux | +3% | 1.8 W/m² | 0.98 |
| Canopy Transpiration | +12% | 2.4 W/m² | 0.97 |
| Sensible Heat, Ground Temp, Snow, Runoff | < 5% | — | > 0.98 |
Scope of validation: these results are for a single site, single year, SP (satellite-phenology) mode, against one CTSM release. They are genuinely encouraging but they do not establish correctness across sites, configurations, or the biogeochemistry subsystems. See the warning above.
- Surface albedo with two-stream canopy radiative transfer
- Snow radiative transfer (SNICAR) with aerosol-snow interactions
- Turbulent fluxes via Monin-Obukhov similarity (bare ground, canopy, lake, urban)
- Photosynthesis (Farquhar/Collatz) with Ball-Berry and Medlyn stomatal conductance
- LUNA photosynthetic optimization (Vcmax/Jmax acclimation)
- Plant hydraulic stress (PHS)
- Snow hydrology (12 layers, compaction, grain-size evolution)
- Soil hydrology (Richards equation, Clapp-Hornberger / van Genuchten)
- Soil temperature with phase change
- Lake temperature and hydrology
- Urban canyon energy balance (CLMU)
- Irrigation, hillslope lateral flow
- Carbon-nitrogen cycling (CN mode) with allocation, respiration
- Decomposition (Century cascade and MIMICS)
- Nitrification-denitrification, nitrogen leaching
- Fire (Li 2014), gap mortality, dynamic vegetation (CNDV)
- Methane (CH4), VOC emissions, dust emission
- All 50+ data structs parameterized on
{FT<:Real}for dual-number propagation - ~520 smoothed discontinuities (
smooth_max,smooth_min,smooth_heaviside) - Pure-Julia LU fallback for banded matrix solves with Dual numbers
- Built-in
CalibrationProblemframework with ForwardDiff gradients - 7 tunable parameters:
vcmax25_scale,medlyn_slope,csoilc,jmax25top_sf,baseflow_scalar,fff,ksat_scale
using CLM
# Run a full simulation
clm_run!(
fsurdat = "path/to/surfdata.nc",
paramfile = "path/to/params.nc",
fforcing = "path/to/forcing/",
fhistory = "output/history.nc"
)
# Or initialize and step manually
inst, bounds, filt, tm = clm_initialize!(
fsurdat = "path/to/surfdata.nc",
paramfile = "path/to/params.nc"
)
clm_drv!(config, inst, filt, filt_ia, bounds, ...)using CLM, ForwardDiff
prob = CalibrationProblem(
params = [
CalibrationParameter("vcmax25_scale", 1.0, setter!, (0.5, 2.0)),
CalibrationParameter("medlyn_slope", 4.1, setter!, (1.0, 10.0)),
],
targets = [CalibrationTarget("LH", getter, obs_LH, 1.0)],
fsurdat = "surfdata.nc",
paramfile = "params.nc",
fforcing = "forcing/"
)
# AD gradients + Armijo line search
result = calibrate(prob; maxiter=20)src/
constants/ Physical constants, control flags, PFT parameters
types/ 50+ mutable structs (SoA layout, {FT<:Real} parameterized)
infrastructure/ Solvers, I/O, initialization, filters, subgrid
biogeophys/ Radiation, turbulence, hydrology, snow, soil, photosynthesis
biogeochem/ Phenology, decomposition, nutrient cycling, fire, CH4, VOC
driver/ Timestep driver, initialization, top-level run
calibration/ AD-based calibration framework
Key design decisions:
- SoA (Structure of Arrays) layout matching Fortran CLM for GPU compatibility
- BitVector masks replace Fortran integer filter arrays — no dynamic rebuild, GPU-ready
- Fortran variable names preserved for line-by-line traceability (
h2osoi_liq,t_soisno, etc.) - Fixed-size snow arrays padded to
nlevsno=12— no dynamic resizing
| Configuration | Wall-clock (1 year, single-point) |
|---|---|
| Fortran CLM5 (ifort -O2) | ~2 s |
| CLM.jl (Float64) | ~4 s |
| CLM.jl + ForwardDiff (1 param) | ~8 s |
| CLM.jl + ForwardDiff (7 params) | ~28 s |
julia --project=. -e 'using Test; include("test/runtests.jl")'15,528 tests: unit tests, end-to-end SP simulation, AD gradient verification (6 climate scenarios), calibration framework, parameter recovery, CN integration, and Enzyme feasibility.
- Julia 1.10+
- NCDatasets.jl, ForwardDiff.jl, JSON.jl
- CLM5 surface data and parameter files (NetCDF)
CLM.jl is, first and foremost, an experiment in agentic software engineering. The porting work was driven by a Ralph-style loop: an AI coding agent run repeatedly against a standing set of instructions, each iteration picking up the next unit of work, reading the original Fortran, writing the Julia translation, adding tests, validating, and committing — with a human steering goals and priorities rather than authoring code.
The working pattern, roughly:
- Read the Fortran completely for the target module (
/installs/clm/). - Translate to Julia following the conventions in
CLAUDE.md(SoA layout, preserved variable names, BitVector masks). - Test — unit tests plus finite-difference derivative checks where applicable.
- Validate against reference Fortran output where a harness exists.
- Iterate until parity, then move to the next module. Repeat, autonomously, for a long time.
Some of the larger pushes (GPU kernelization, reverse-mode AD, BGC subsystems) were run as multi-agent workflows — fan-out batches of agents working in parallel, with adversarial verification passes. The accumulated decisions, dead-ends, and lessons live in the commit history, PORTING_LOG.md, and the agents' own memory.
What this means for you:
- The code is idiomatic and traceable because the loop was told to keep it that way — but consistency is not correctness.
- Coverage is broad but uneven. Biogeophysics is the most exercised; biogeochemistry is implemented but largely unvalidated.
- If you find a bug, you are likely the first human to look at that code closely. Issues and fixes are very welcome (see Contributing).
This is shared in the spirit of finding out what is possible. Calibrate your trust accordingly.
If you use CLM.jl, please cite the repository:
Eythorsson, D. (2026). CLM.jl: A differentiable Julia port of the Community Land Model. https://github.com/DarriEy/CLM.jl
Issues and pull requests are welcome. Please run the full test suite before submitting changes:
julia --project=. -e 'using Test; include("test/runtests.jl")'The original contributions in this repository (the Julia translation, AD/GPU work, test and parity harnesses) are licensed under the MIT License.
CLM.jl is a port — and therefore a derivative work — of the Fortran CTSM/CLM5 source, which is distributed by UCAR/NCAR under a BSD 3-Clause license. That upstream copyright notice and license are retained in the NOTICE file and continue to govern the portions of this work derived from CTSM.
CLM.jl is based on the Community Land Model developed by the National Center for Atmospheric Research (NCAR) and the broader CESM community. The original Fortran CLM5 is described in Lawrence et al. (2019).
