MatPES (Materials Potential Energy Surface) is a potential energy surface dataset with near-complete coverage of the periodic table, designed to train foundation potentials (FPs), i.e., machine learning interatomic potentials (MLIPs) for materials. MatPES is an initiative by the Materialyze.AI lab and the Materials Project to address critical deficiencies in existing PES datasets.
| Version | Date | Description | Download |
|---|---|---|---|
| 2025.2 | 15 Apr 2026 | Addition of Bader and DDEC6 charges; removed a small number of duplicated structures. | PBE, r2SCAN |
| 2025.1 | 6 Mar 2025 | Initial release (~400k structures) | PBE, r2SCAN |
| - | 6 Mar 2025 | Atomic reference energies | PBE, r2SCAN |
- Accuracy. MatPES is computed using static DFT calculations with stringent convergence criteria. Please refer
to the
MatPESStaticSetin pymatgen for details. - Comprehensiveness. MatPES structures are sampled using a 2-stage version of DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling from a greatly expanded configuration of MD structures.
- Quality. MatPES includes computed data from the PBE functional, as well as the high fidelity r2SCAN meta-GGA functional with improved description across diverse bonding and chemistries.
MatPES 2025.2 is the latest public release; it extends the initial 2025.1 release (~400,000 structures from 300 K MD simulations) with Bader and DDEC6 atomic charges and removal of duplicated structures. The dataset remains much smaller than other PES datasets in the literature and yet achieves comparable or, in some cases, improved performance and reliability on trained FPs.
MatPES is part of the MatML ecosystem, which also includes MatGL (Materials Graph Library), maml (MAterials Machine Learning), and MatCalc (Materials Calculator).
The MatPES dataset is available on Hugging Face. You can use
the datasets package to download it:
from datasets import load_dataset
load_dataset("materialyze/matpes", "pbe")
load_dataset("materialyze/matpes", "r2scan")Without any version specifiers, the latest version of each dataset will be returned.
To download a specific version, append a -<version> specifier. For example:
load_dataset("materialyze/matpes", "r2scan-2025.2")MatPES datasets are distributed as JSONL (.jsonl) files rather than a single
JSON array. Each line is one complete, self-contained JSON record for a structure. This delivers several
concrete benefits over a monolithic .json file:
- Constant memory usage. Stream the dataset one record at a time instead of loading the entire (multi-GB) file into memory before you can touch a single structure.
- Fast, resumable processing. Start iterating from the first line immediately, filter or sample without a full parse, and process in parallel by splitting on line boundaries.
- Appendable and composable. Concatenate,
head/tail,grep, orsplitfiles with standard Unix tools; add new records simply by appending lines. - Native tooling support. This is the format that
datasets.load_dataset(and most data-loading pipelines) consume directly, with no custom parsing.
To read a .jsonl file directly without the datasets package:
import json
with open("MatPES-PBE-2025.2.jsonl", encoding="utf-8") as f:
data = [json.loads(line) for line in f]The matpes python package, which provides tools for working with the MatPES datasets, can be installed via pip:
pip install matpesSome command line usage examples:
# Download the PBE dataset to the current directory.
# You should see a MatPES-PBE-2025.2.jsonl file in your directory.
matpes download pbe
# Extract all entries in the Fe-O chemical system.
matpes data -i MatPES-PBE-2025.2.jsonl --chemsys Fe-O -o Fe-O.jsonlThe matpes.db module provides functionality to create your own MongoDB database with the downloaded MatPES data,
which is extremely useful if you plan to work with the data (e.g., querying, adding entries, etc.) extensively.
We have released a set of MatPES-trained foundation potentials (FPs) in the M3GNet, CHGNet, and TensorNet architectures in the MatGL package. For example, you can load the TensorNet FP trained on MatPES PBE 2025.2 as follows:
import matgl
potential = matgl.load_model("TensorNet-PES-MatPES-PBE-2025.2")Model names follow the format <architecture>-PES-<dataset>-<dataset-version>.
These FPs can be used easily with the MatCalc package to rapidly compute properties. For example:
from matcalc.elasticity import ElasticityCalc
calculator = ElasticityCalc("TensorNet-PES-MatPES-PBE-2025.2")
calculator.calc(structure)We have provided Jupyter notebooks demonstrating how to load the MatPES dataset, train a model, and perform fine-tuning.
If you use the MatPES dataset, please cite the following work:
Kaplan, A. D.; Liu, R.; Qi, J.; Ko, T. W.; Deng, B.; Riebesell, J.; Ceder, G.; Persson, K. A.; Ong, S. P. A
Foundational Potential Energy Surface Dataset for Materials. arXiv 2025. DOI: 10.48550/arXiv.2503.04070.In addition, if you use any of the pre-trained FPs or architectures, please cite the references provided on the architecture used as well as MatGL.