📄 Paper | 🤗 HuggingFace (Model checkpoints) | 🛢️ Pre-training Data | 🏆 Mars-Bench (Downstream tasks)
Mirali Purohit1,2†, Bimal Gajera1*, Irish Mehta1*, Bhanu Tokas1*,
Jacob Adler1, Steven Lu2, Scott Dickenshied1, Serina Diniega2,
Brian Bue2, Umaa Rebbapragada2, Hannah Kerner1
1Arizona State University
2Jet Propulsion Laboratory, California Institute of Technology
*Equal Contribution †Corresponding Author
We introduce MOMO, the first multi-sensor foundation model for Mars remote sensing. MOMO uses model merging to integrate representations learned independently from three key Martian orbital sensors: HiRISE, CTX, and THEMIS; spanning resolutions from 0.25 m/pixel to 100 m/pixel.
Central to our method is a novel Equal Validation Loss (EVL) strategy, which aligns checkpoints across sensors based on validation loss similarity before fusion via task arithmetic. This ensures models are merged at compatible convergence stages, leading to improved stability and generalization.
MOMO is trained on approximately 12 million Mars orbital samples and evaluated on 9 downstream tasks from Mars-Bench. It outperforms ImageNet pre-trained, Earth observation foundation model, sensor-specific pre-training, and fully-supervised baselines, with particularly consistent gains on segmentation tasks.

MOMO can be effectively applied across a wide range of resolutions and a broad spectrum of Martian remote sensing tasks, including large-scale crater or landslide mapping and precise boulder localization.
# Install the package with core dependencies
pip install -e .
# Install with development dependencies (for testing, linting, etc.)
pip install -e ".[dev]"Requires Python 3.10+ and CUDA 12.x for GPU support.
Pre-trained model weights are available on HuggingFace for three ViT architectures (ViT-Small, ViT-Base, ViT-Large).
import torch
from huggingface_hub import hf_hub_download
# Download MOMO ViT-Base checkpoint
path = hf_hub_download(repo_id="Mirali33/MOMO", filename="vit-b-16/momo.pth")
checkpoint = torch.load(path, map_location="cpu", weights_only=False)Replace vit-b-16 with vit-s-16 or vit-l-16 for other architectures, and momo.pth with ctx.pth, hirise.pth, themis.pth, or hirise_ctx_themis.pth for sensor-specific checkpoints.
| File | Description |
|---|---|
ctx.pth |
Pre-trained on CTX (ConTeXt Camera) |
hirise.pth |
Pre-trained on HiRISE (High Resolution Imaging Science Experiment) |
themis.pth |
Pre-trained on THEMIS (THermal EMission Imaging System) |
hirise_ctx_themis.pth |
Pre-trained jointly on all three sensors |
momo.pth |
MOMO merged model via task arithmetic + EVL (main contribution) |
If you use MOMO in your research, please use the following citation:
@InProceedings{Purohit_2026_CVPR,
author = {Purohit, Mirali and Gajera, Bimal and Mehta, Irish and Tokas, Bhanu and Adler, Jacob and Lu, Steven and Dickenshied, Scott and Diniega, Serina and Bue, Brian and Rebbapragada, Umaa and Kerner, Hannah},
title = {MOMO: Mars Orbital MOdel Foundation Model for Mars Orbital Applications},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {27772-27782}
}Please reach out to Mirali Purohit mpurohi3@asu.edu, if you have any queries or issues regarding MOMO.
