Dataset
Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Val
Species content of dataset
Name :
Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Val
Authors :
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
Description :
Validation split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The validation split (~10% of cleaned data) uses a stratified split method consistent with the training and test splits. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Cite As :
Malosso, C., Bigi, F., Pegolo, P., Abbott, J. W., Loche, P., Rossi, M., Ceriotti, M., and Mazitov, A. "Massive Atomic Diversity MAD-1.5 r2SCAN Val." ColabFit, 2026. https://doi.org/None.
ColabFit ID :
Extended ID :
Date Added :
2026-05-21
License :
CC-BY-4.0
Downloads :
0
Num. Configurations :
18,305
Num. Atoms :
320,218
Calculated Property Types :
atomic_forces
atomization_energy
cauchy_stress
energy
Elements :
Ac (0.08%)
Ag (0.52%)
Al (0.92%)
Am (0.08%)
Ar (0.09%)
As (0.83%)
At (0.08%)
Au (0.34%)
B (1.09%)
Ba (0.56%)
Be (0.33%)
Bi (0.56%)
Bk (0.08%)
Br (0.93%)
C (11.93%)
Ca (0.75%)
Cd (0.44%)
Ce (0.16%)
Cf (0.07%)
Cl (2.32%)
Cm (0.08%)
Co (0.6%)
Cr (0.35%)
Cs (0.7%)
Cu (0.81%)
Dy (0.08%)
Er (0.08%)
Es (0.07%)
Eu (0.06%)
F (3.55%)
Fe (0.67%)
Fm (0.08%)
Fr (0.07%)
Ga (0.68%)
Gd (0.07%)
Ge (0.9%)
H (16.55%)
He (0.09%)
Hf (0.4%)
Hg (0.34%)
Ho (0.07%)
I (0.86%)
In (0.47%)
Ir (0.31%)
K (1.15%)
Kr (0.1%)
La (0.22%)
Li (0.89%)
Lu (0.17%)
Md (0.08%)
Mg (0.6%)
Mn (0.47%)
Mo (0.5%)
N (5.64%)
Na (0.99%)
Nb (0.53%)
Nd (0.07%)
Ne (0.09%)
Ni (0.74%)
No (0.08%)
Np (0.07%)
O (18.31%)
Os (0.17%)
P (1.63%)
Pa (0.09%)
Pb (0.51%)
Pd (0.5%)
Pm (0.07%)
Po (0.09%)
Pr (0.08%)
Pt (0.32%)
Pu (0.08%)
Ra (0.09%)
Rb (0.56%)
Re (0.29%)
Rh (0.35%)
Rn (0.09%)
Ru (0.28%)
S (3.26%)
Sb (0.62%)
Sc (0.42%)
Se (1.38%)
Si (1.15%)
Sm (0.08%)
Sn (0.57%)
Sr (0.71%)
Ta (0.45%)
Tb (0.07%)
Tc (0.12%)
Te (0.78%)
Th (0.1%)
Ti (0.54%)
Tl (0.42%)
Tm (0.09%)
U (0.11%)
V (0.54%)
W (0.31%)
Xe (0.11%)
Y (0.67%)
Yb (0.15%)
Zn (0.7%)
Zr (0.61%)
Methods :
DFT-r2SCAN
Software :
FHI-aims v250806
Publication Link :
Data Source Link :
Other Links :
Spec File :
Configuration Sets by Name :
Configuration Sets by ID :
Name: Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Val
Extended ID: Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Val__Malosso-Bigi-Pegolo-Abbott-Loche-Rossi-Ceriotti-Mazitov__DS_lzf5dc7nytxe_0
Description: Validation split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The validation split (~10% of cleaned data) uses a stratified split method consistent with the training and test splits. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Authors:
Cesare Malosso
Filippo Bigi
Paolo Pegolo
Joseph W. Abbott
Philip Loche
Mariana Rossi
Michele Ceriotti
Arslan Mazitov
DOI: None
Calculated Property Types:
atomic_forces
atomization_energy
cauchy_stress
energy
Elements:
Ac (0.08%)
Ag (0.52%)
Al (0.92%)
Am (0.08%)
Ar (0.09%)
As (0.83%)
At (0.08%)
Au (0.34%)
B (1.09%)
Ba (0.56%)
Be (0.33%)
Bi (0.56%)
Bk (0.08%)
Br (0.93%)
C (11.93%)
Ca (0.75%)
Cd (0.44%)
Ce (0.16%)
Cf (0.07%)
Cl (2.32%)
Cm (0.08%)
Co (0.6%)
Cr (0.35%)
Cs (0.7%)
Cu (0.81%)
Dy (0.08%)
Er (0.08%)
Es (0.07%)
Eu (0.06%)
F (3.55%)
Fe (0.67%)
Fm (0.08%)
Fr (0.07%)
Ga (0.68%)
Gd (0.07%)
Ge (0.9%)
H (16.55%)
He (0.09%)
Hf (0.4%)
Hg (0.34%)
Ho (0.07%)
I (0.86%)
In (0.47%)
Ir (0.31%)
K (1.15%)
Kr (0.1%)
La (0.22%)
Li (0.89%)
Lu (0.17%)
Md (0.08%)
Mg (0.6%)
Mn (0.47%)
Mo (0.5%)
N (5.64%)
Na (0.99%)
Nb (0.53%)
Nd (0.07%)
Ne (0.09%)
Ni (0.74%)
No (0.08%)
Np (0.07%)
O (18.31%)
Os (0.17%)
P (1.63%)
Pa (0.09%)
Pb (0.51%)
Pd (0.5%)
Pm (0.07%)
Po (0.09%)
Pr (0.08%)
Pt (0.32%)
Pu (0.08%)
Ra (0.09%)
Rb (0.56%)
Re (0.29%)
Rh (0.35%)
Rn (0.09%)
Ru (0.28%)
S (3.26%)
Sb (0.62%)
Sc (0.42%)
Se (1.38%)
Si (1.15%)
Sm (0.08%)
Sn (0.57%)
Sr (0.71%)
Ta (0.45%)
Tb (0.07%)
Tc (0.12%)
Te (0.78%)
Th (0.1%)
Ti (0.54%)
Tl (0.42%)
Tm (0.09%)
U (0.11%)
V (0.54%)
W (0.31%)
Xe (0.11%)
Y (0.67%)
Yb (0.15%)
Zn (0.7%)
Zr (0.61%)
Methods:
DFT-r2SCAN
Software:
FHI-aims v250806
Number of Configurations: 18,305
Number of Atoms: 320,218
Publication Link: https://doi.org/10.48550/arXiv.2603.02089
Data Source Link: https://doi.org/10.24435/materialscloud:jc-9f
Other Links:
https://github.com/lab-cosmo/upet
Dataset viewer powered by Hugging Face
No uploaded content is transferred in ownership from the original creators to ColabFit. All content is distributed under the license specified by its contributor who has stated that he or she has the authority to share it under the specified license.