CVPR 2026
Nan Yang · Julian Straub · Fan Zhang · Richard Newcombe · Jakob Engel · Lingni Ma
Meta Reality Labs Research
LAMP tracks 3D human motion from egocentric multi-camera headsets via early disentanglement of observer and target motion. Using known device 6-DoF motion and calibration, 2D body keypoints from all cameras over a temporal window are lifted into a unified 3D world reference frame, and an end-to-end trained spatio-temporal transformer fits 3D human motion directly to this 3D ray cloud. This "lift-then-fit" approach achieves state-of-the-art results on monocular benchmarks while significantly outperforming baselines on the targeted egocentric setting.
LAMP needs an NVIDIA GPU with a driver that supports CUDA 12. The CUDA runtime, cuDNN, and TensorRT are installed into the virtual environment via pip, so no system-wide CUDA toolkit is required — only the driver comes from the host. We tested on Fedora with a CUDA 12 driver.
# Install uv (https://docs.astral.sh/uv/) if not already installed.
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment with uv
uv venv .venv --python 3.12
source .venv/bin/activate
# Install dependencies
uv pip install -r requirements.txt
uv pip install wheel_stub
uv pip install --no-build-isolation 'tensorrt-cu12==10.16.1.11'
# Headless OpenCV avoids GL/display errors on server.
uv pip uninstall opencv-python || true
uv pip install --force-reinstall opencv-python-headless==4.13.0.92
# Smoke test
python scripts/smoke_test.pyThe same setup can be run with the convenience script:
bash scripts/install.sh
source .venv/bin/activateRequired runtime artifacts:
- LAMP SMPL checkpoint as a plain
.ptstate dict fromfacebook/LAMP. - SMPL neutral
.pklfrom the official SMPL download source. - A flat recording folder from the
facebook/LAMPdataset containingvideo.vrs,closed_loop_trajectory.csv,online_calibration.jsonl, andsemidense_points.csv.gz. - Optional RF-DETR weights; when omitted, RF-DETR downloads its default checkpoint to
~/.cache/lamp.
We host the model checkpoint and the sample data recorded with Project Aria Gen2 on HuggingFace. You can download them via the following script:
bash scripts/fetch_artifacts.shThe above script will NOT download the SMPL model. Please download basicmodel_neutral_lbs_10_207_0_v1.0.0.pkl from here (Login -> Downloads -> SMPLIFY_CODE_V2.ZIP).
The official SMPL .pkl stores its arrays as Chumpy objects that no longer load under modern NumPy/Python. Chumpy cannot be installed in the LAMP environment, so strip it out once in a throwaway environment and keep only the resulting plain-NumPy .pkl:
# One-time conversion, isolated from the LAMP venv.
uv python install 3.10
uv venv /tmp/smpl_clean --python 3.10 && source /tmp/smpl_clean/bin/activate
uv pip install pip setuptools wheel "numpy<1.24"
uv pip install chumpy==0.70 --no-build-isolation # chumpy's setup.py imports pip
# Load with chumpy, convert the chumpy arrays to plain NumPy, write to data/.
# Run this from the repo root.
python - <<'EOF'
import pickle
import numpy as np
src = "/path/to/basicmodel_neutral_lbs_10_207_0_v1.0.0.pkl"
with open(src, "rb") as f: # latin1 decodes the Python-2 pickle
data = pickle.load(f, encoding="latin1")
clean = {k: (np.array(v) if "chumpy" in str(type(v)).lower() else v)
for k, v in data.items()}
with open("data/SMPL_NEUTRAL.pkl", "wb") as f:
pickle.dump(clean, f)
print("wrote data/SMPL_NEUTRAL.pkl")
EOF
deactivateThat writes the chumpy-free model straight to data/SMPL_NEUTRAL.pkl, ready for LAMP.
From the repo root, with the venv activated:
python -m lamp.app.cli run \
--recording ./data/test-library \
--checkpoint ./ckpts/lamp_smpl_aria_gen2.pt \
--smpl-model-path ./data/SMPL_NEUTRAL.pklOptionally, we can set the ground-plane height before starting to get better global pose estimation accuracy: in the viewer's Floor folder, drag Floor Z (m) to the floor and click Select floor.
- RF-DETR for person detection and
- ViTPose for 2D keypoint estimation.
- MotionBERT for the inspiration of the spatial-temporal transformer.
- SMPL and smplx package for the human body model.
- Boxer for the camera projection functions.
- Viser for the interactive 3D web visualizer.
@inproceedings{yang2026lamp,
title = {{LAMP}: Localization Aware Multi-camera People Tracking in Metric {3D} World},
author = {Yang, Nan and Straub, Julian and Zhang, Fan and Newcombe, Richard and Engel, Jakob and Ma, Lingni},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}LAMP is CC-BY-NC licensed, as found in the LICENSE file.


