Dataset

Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Val




Species content of dataset


Name :
Massive_Atomic_Diversity_MAD-1.5_r2SCAN_Val
Authors :
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
Description :
Validation split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The validation split (~10% of cleaned data) uses a stratified split method consistent with the training and test splits. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Cite As :
Malosso, C., Bigi, F., Pegolo, P., Abbott, J. W., Loche, P., Rossi, M., Ceriotti, M., and Mazitov, A. "Massive Atomic Diversity MAD-1.5 r2SCAN Val." ColabFit, 2026. https://doi.org/None.
ColabFit ID :
Date Added :
2026-05-21
License :
CC-BY-4.0
Downloads :
0
Num. Configurations :
18,305
Num. Atoms :
320,218
Calculated Property Types :
atomic_forces atomization_energy cauchy_stress energy
Elements :
Ac (0.08%) Ag (0.52%) Al (0.92%) Am (0.08%) Ar (0.09%) As (0.83%) At (0.08%) Au (0.34%) B (1.09%) Ba (0.56%) Be (0.33%) Bi (0.56%) Bk (0.08%) Br (0.93%) C (11.93%) Ca (0.75%) Cd (0.44%) Ce (0.16%) Cf (0.07%) Cl (2.32%) Cm (0.08%) Co (0.6%) Cr (0.35%) Cs (0.7%) Cu (0.81%) Dy (0.08%) Er (0.08%) Es (0.07%) Eu (0.06%) F (3.55%) Fe (0.67%) Fm (0.08%) Fr (0.07%) Ga (0.68%) Gd (0.07%) Ge (0.9%) H (16.55%) He (0.09%) Hf (0.4%) Hg (0.34%) Ho (0.07%) I (0.86%) In (0.47%) Ir (0.31%) K (1.15%) Kr (0.1%) La (0.22%) Li (0.89%) Lu (0.17%) Md (0.08%) Mg (0.6%) Mn (0.47%) Mo (0.5%) N (5.64%) Na (0.99%) Nb (0.53%) Nd (0.07%) Ne (0.09%) Ni (0.74%) No (0.08%) Np (0.07%) O (18.31%) Os (0.17%) P (1.63%) Pa (0.09%) Pb (0.51%) Pd (0.5%) Pm (0.07%) Po (0.09%) Pr (0.08%) Pt (0.32%) Pu (0.08%) Ra (0.09%) Rb (0.56%) Re (0.29%) Rh (0.35%) Rn (0.09%) Ru (0.28%) S (3.26%) Sb (0.62%) Sc (0.42%) Se (1.38%) Si (1.15%) Sm (0.08%) Sn (0.57%) Sr (0.71%) Ta (0.45%) Tb (0.07%) Tc (0.12%) Te (0.78%) Th (0.1%) Ti (0.54%) Tl (0.42%) Tm (0.09%) U (0.11%) V (0.54%) W (0.31%) Xe (0.11%) Y (0.67%) Yb (0.15%) Zn (0.7%) Zr (0.61%)
Methods :
DFT-r2SCAN
Software :
FHI-aims v250806
Spec File :
Configuration Sets by Name :
Configuration Sets by ID :
Dataset viewer powered by Hugging Face

No uploaded content is transferred in ownership from the original creators to ColabFit. All content is distributed under the license specified by its contributor who has stated that he or she has the authority to share it under the specified license.