Dataset
Massive_Atomic_Diversity_MAD_train
Download Original Data Files
68.1 MB
Download Dataset Parquet Files
97.6 MB
Species content of dataset
Name :
Massive_Atomic_Diversity_MAD_train
Extended ID :
ColabFit ID :
Files :
Description :
The training split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
Authors :
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti
DOI :
10.60732/f5b6ea1b
https://commons.datacite.org/doi.org/10.60732/f5b6ea1b
https://doi.datacite.org/dois/10.60732%2Ff5b6ea1b
https://doi.org/10.60732/f5b6ea1b
Cite as: Mazitov, A., Chorna, S., Fraux, G., Bercx, M., Pizzi, G., De, S., and Ceriotti, M. "Massive Atomic Diversity MAD train." ColabFit, 2025. https://doi.org/10.60732/f5b6ea1b.
For other citation formats, see the DataCite Fabrica page for this dataset.
For other citation formats, see the DataCite Fabrica page for this dataset.
Num. Configurations :
76,482
Num. Atoms :
2,064,229
Downloads :
0
Calculated Property Types :
atomic_forces
cauchy_stress
energy
Elements :
Ag (0.42%)
Al (0.86%)
Ar (0.01%)
As (0.66%)
Au (0.28%)
B (0.95%)
Ba (0.39%)
Be (0.26%)
Bi (0.47%)
Br (0.85%)
C (13.52%)
Ca (0.75%)
Cd (0.42%)
Ce (0.01%)
Cl (2.48%)
Co (0.62%)
Cr (0.38%)
Cs (0.69%)
Cu (0.8%)
Dy (0.01%)
Er (0.01%)
Eu (0.01%)
F (3.59%)
Fe (0.7%)
Ga (0.59%)
Gd (0.01%)
Ge (0.82%)
H (18.63%)
He (0.02%)
Hf (0.42%)
Hg (0.32%)
Ho (0.01%)
I (0.85%)
In (0.45%)
Ir (0.25%)
K (1.09%)
Kr (0.01%)
La (0.01%)
Li (0.86%)
Lu (0.01%)
Mg (0.5%)
Mn (0.5%)
Mo (0.51%)
N (5.84%)
Na (1.01%)
Nb (0.6%)
Nd (0.01%)
Ne (0.01%)
Ni (0.83%)
O (19.2%)
Os (0.11%)
P (1.63%)
Pb (0.47%)
Pd (0.45%)
Pm (0.01%)
Po (0.02%)
Pr (0.01%)
Pt (0.33%)
Rb (0.56%)
Re (0.22%)
Rh (0.29%)
Rn (0.01%)
Ru (0.2%)
S (3.2%)
Sb (0.58%)
Sc (0.38%)
Se (1.48%)
Si (1.17%)
Sm (0.01%)
Sn (0.57%)
Sr (0.71%)
Ta (0.39%)
Tb (0.01%)
Tc (0.07%)
Te (0.76%)
Ti (0.61%)
Tl (0.4%)
Tm (0.01%)
V (0.55%)
W (0.29%)
Xe (0.04%)
Y (0.67%)
Yb (0.01%)
Zn (0.61%)
Zr (0.66%)
Methods :
DFT-PBEsol
Software :
VASP
Publication Link :
Data Source Link :
Configuration Sets by Name :
Configuration Sets by ID :
Name: Massive_Atomic_Diversity_MAD_train
Extended ID: Massive_Atomic_Diversity_MAD_train__Mazitov-Chorna-Fraux-Bercx-Pizzi-De-Ceriotti__DS_h8s4lfyits34_0
Description: The training split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
Authors:
Arslan Mazitov
Sofiia Chorna
Guillaume Fraux
Marnik Bercx
Giovanni Pizzi
Sandip De
Michele Ceriotti
DOI: 10.60732/f5b6ea1b
Calculated Property Types:
atomic_forces
cauchy_stress
energy
Elements:
Ag (0.42%)
Al (0.86%)
Ar (0.01%)
As (0.66%)
Au (0.28%)
B (0.95%)
Ba (0.39%)
Be (0.26%)
Bi (0.47%)
Br (0.85%)
C (13.52%)
Ca (0.75%)
Cd (0.42%)
Ce (0.01%)
Cl (2.48%)
Co (0.62%)
Cr (0.38%)
Cs (0.69%)
Cu (0.8%)
Dy (0.01%)
Er (0.01%)
Eu (0.01%)
F (3.59%)
Fe (0.7%)
Ga (0.59%)
Gd (0.01%)
Ge (0.82%)
H (18.63%)
He (0.02%)
Hf (0.42%)
Hg (0.32%)
Ho (0.01%)
I (0.85%)
In (0.45%)
Ir (0.25%)
K (1.09%)
Kr (0.01%)
La (0.01%)
Li (0.86%)
Lu (0.01%)
Mg (0.5%)
Mn (0.5%)
Mo (0.51%)
N (5.84%)
Na (1.01%)
Nb (0.6%)
Nd (0.01%)
Ne (0.01%)
Ni (0.83%)
O (19.2%)
Os (0.11%)
P (1.63%)
Pb (0.47%)
Pd (0.45%)
Pm (0.01%)
Po (0.02%)
Pr (0.01%)
Pt (0.33%)
Rb (0.56%)
Re (0.22%)
Rh (0.29%)
Rn (0.01%)
Ru (0.2%)
S (3.2%)
Sb (0.58%)
Sc (0.38%)
Se (1.48%)
Si (1.17%)
Sm (0.01%)
Sn (0.57%)
Sr (0.71%)
Ta (0.39%)
Tb (0.01%)
Tc (0.07%)
Te (0.76%)
Ti (0.61%)
Tl (0.4%)
Tm (0.01%)
V (0.55%)
W (0.29%)
Xe (0.04%)
Y (0.67%)
Yb (0.01%)
Zn (0.61%)
Zr (0.66%)
Methods:
DFT-PBEsol
Software:
VASP
Number of Configurations: 76,482
Number of Atoms: 2,064,229
Publication Link: https://doi.org/10.48550/arXiv.2506.19674
Data Source Link: https://doi.org/10.24435/materialscloud:vd-e8
No uploaded content is transferred in ownership from the original creators to ColabFit. All content is distributed under the license specified by its contributor who has stated that he or she has the authority to share it under the specified license.