Dataset
Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Test
Species content of dataset
Name :
Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Test
Authors :
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
Description :
Test split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The test split (~10% of cleaned data, excluding monomers, dimers, and trimers which are fixed in the training split) uses a stratified split method consistent with the training and validation splits. Subset-resolved MAE for PET-MAD-1.5-S on this test set is 11.09 meV/atom (energy) and 36.81 meV/Angstrom (forces). A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Cite As :
Malosso, C., Bigi, F., Pegolo, P., Abbott, J. W., Loche, P., Rossi, M., Ceriotti, M., and Mazitov, A. "Massive Atomic Diversity MAD-1.5 r2SCAN Test." ColabFit, 2026. https://doi.org/None.
ColabFit ID :
Extended ID :
Date Added :
2026-05-21
License :
CC-BY-4.0
Downloads :
0
Num. Configurations :
18,314
Num. Atoms :
321,704
Calculated Property Types :
atomic_forces
atomization_energy
cauchy_stress
energy
Elements :
Ac (0.08%)
Ag (0.4%)
Al (0.9%)
Am (0.08%)
Ar (0.09%)
As (0.67%)
At (0.08%)
Au (0.3%)
B (1.13%)
Ba (0.46%)
Be (0.34%)
Bi (0.48%)
Bk (0.08%)
Br (0.88%)
C (11.5%)
Ca (0.78%)
Cd (0.52%)
Ce (0.16%)
Cf (0.08%)
Cl (2.16%)
Cm (0.08%)
Co (0.58%)
Cr (0.38%)
Cs (0.74%)
Cu (0.77%)
Dy (0.07%)
Er (0.09%)
Es (0.08%)
Eu (0.07%)
F (3.85%)
Fe (0.69%)
Fm (0.08%)
Fr (0.07%)
Ga (0.57%)
Gd (0.06%)
Ge (0.85%)
H (16.64%)
He (0.09%)
Hf (0.44%)
Hg (0.31%)
Ho (0.06%)
I (0.92%)
In (0.53%)
Ir (0.3%)
K (1.24%)
Kr (0.1%)
La (0.21%)
Li (0.81%)
Lu (0.17%)
Md (0.08%)
Mg (0.55%)
Mn (0.49%)
Mo (0.56%)
N (5.82%)
Na (1.05%)
Nb (0.58%)
Nd (0.08%)
Ne (0.1%)
Ni (0.84%)
No (0.07%)
Np (0.08%)
O (18.69%)
Os (0.19%)
P (1.63%)
Pa (0.08%)
Pb (0.54%)
Pd (0.46%)
Pm (0.07%)
Po (0.1%)
Pr (0.09%)
Pt (0.4%)
Pu (0.07%)
Ra (0.08%)
Rb (0.54%)
Re (0.26%)
Rh (0.34%)
Rn (0.09%)
Ru (0.28%)
S (2.94%)
Sb (0.66%)
Sc (0.44%)
Se (1.41%)
Si (1.11%)
Sm (0.07%)
Sn (0.65%)
Sr (0.74%)
Ta (0.44%)
Tb (0.07%)
Tc (0.17%)
Te (0.76%)
Th (0.11%)
Ti (0.53%)
Tl (0.44%)
Tm (0.09%)
U (0.1%)
V (0.54%)
W (0.35%)
Xe (0.12%)
Y (0.63%)
Yb (0.13%)
Zn (0.7%)
Zr (0.58%)
Methods :
DFT-r2SCAN
Software :
FHI-aims v250806
Publication Link :
Data Source Link :
Other Links :
Spec File :
Configuration Sets by Name :
Configuration Sets by ID :
Name: Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Test
Extended ID: Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Test__Malosso-Bigi-Pegolo-Abbott-Loche-Rossi-Ceriotti-Mazitov__DS_7q71mf99le0c_0
Description: Test split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The test split (~10% of cleaned data, excluding monomers, dimers, and trimers which are fixed in the training split) uses a stratified split method consistent with the training and validation splits. Subset-resolved MAE for PET-MAD-1.5-S on this test set is 11.09 meV/atom (energy) and 36.81 meV/Angstrom (forces). A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Authors:
Cesare Malosso
Filippo Bigi
Paolo Pegolo
Joseph W. Abbott
Philip Loche
Mariana Rossi
Michele Ceriotti
Arslan Mazitov
DOI: None
Calculated Property Types:
atomic_forces
atomization_energy
cauchy_stress
energy
Elements:
Ac (0.08%)
Ag (0.4%)
Al (0.9%)
Am (0.08%)
Ar (0.09%)
As (0.67%)
At (0.08%)
Au (0.3%)
B (1.13%)
Ba (0.46%)
Be (0.34%)
Bi (0.48%)
Bk (0.08%)
Br (0.88%)
C (11.5%)
Ca (0.78%)
Cd (0.52%)
Ce (0.16%)
Cf (0.08%)
Cl (2.16%)
Cm (0.08%)
Co (0.58%)
Cr (0.38%)
Cs (0.74%)
Cu (0.77%)
Dy (0.07%)
Er (0.09%)
Es (0.08%)
Eu (0.07%)
F (3.85%)
Fe (0.69%)
Fm (0.08%)
Fr (0.07%)
Ga (0.57%)
Gd (0.06%)
Ge (0.85%)
H (16.64%)
He (0.09%)
Hf (0.44%)
Hg (0.31%)
Ho (0.06%)
I (0.92%)
In (0.53%)
Ir (0.3%)
K (1.24%)
Kr (0.1%)
La (0.21%)
Li (0.81%)
Lu (0.17%)
Md (0.08%)
Mg (0.55%)
Mn (0.49%)
Mo (0.56%)
N (5.82%)
Na (1.05%)
Nb (0.58%)
Nd (0.08%)
Ne (0.1%)
Ni (0.84%)
No (0.07%)
Np (0.08%)
O (18.69%)
Os (0.19%)
P (1.63%)
Pa (0.08%)
Pb (0.54%)
Pd (0.46%)
Pm (0.07%)
Po (0.1%)
Pr (0.09%)
Pt (0.4%)
Pu (0.07%)
Ra (0.08%)
Rb (0.54%)
Re (0.26%)
Rh (0.34%)
Rn (0.09%)
Ru (0.28%)
S (2.94%)
Sb (0.66%)
Sc (0.44%)
Se (1.41%)
Si (1.11%)
Sm (0.07%)
Sn (0.65%)
Sr (0.74%)
Ta (0.44%)
Tb (0.07%)
Tc (0.17%)
Te (0.76%)
Th (0.11%)
Ti (0.53%)
Tl (0.44%)
Tm (0.09%)
U (0.1%)
V (0.54%)
W (0.35%)
Xe (0.12%)
Y (0.63%)
Yb (0.13%)
Zn (0.7%)
Zr (0.58%)
Methods:
DFT-r2SCAN
Software:
FHI-aims v250806
Number of Configurations: 18,314
Number of Atoms: 321,704
Publication Link: https://doi.org/10.48550/arXiv.2603.02089
Data Source Link: https://doi.org/10.24435/materialscloud:jc-9f
Other Links:
https://github.com/lab-cosmo/upet
Dataset viewer powered by Hugging Face
No uploaded content is transferred in ownership from the original creators to ColabFit. All content is distributed under the license specified by its contributor who has stated that he or she has the authority to share it under the specified license.