Dataset
Alex_MP-20_train
Download Original Data Files
191.5 MB
Download Dataset Parquet Files
325.3 MB
Species content of dataset
Name :
Alex_MP-20_train
ColabFit ID :
Files :
Description :
The train split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom.
Authors :
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie
DOI :
10.60732/8d6afc67
https://commons.datacite.org/doi.org/10.60732/8d6afc67
https://doi.datacite.org/dois/10.60732%2F8d6afc67
https://doi.org/10.60732/8d6afc67
Cite as: Zeni, C., Pinsler, R., Zügner, D., Fowler, A., Horton, M., Fu, X., Wang, Z., Shysheya, A., Crabbé, J., Ueda, S., Sordillo, R., Sun, L., Smith, J., Nguyen, B., Schulz, H., Lewis, S., Huang, C., Lu, Z., Zhou, Y., Yang, H., Hao, H., Li, J., Yang, C., Li, W., Tomioka, R., and Xie, T. "Alex MP-20 train." ColabFit, 2025. https://doi.org/10.60732/8d6afc67.
For other citation formats, see the DataCite Fabrica page for this dataset.
For other citation formats, see the DataCite Fabrica page for this dataset.
Num. Configurations :
540,162
Num. Atoms :
5,184,565
Downloads :
8
Calculated Property Types :
electronic_band_gap
energy_above_hull
Elements :
Ag (1.84%)
Al (1.41%)
As (0.89%)
Au (1.85%)
B (0.39%)
Ba (1.35%)
Be (0.36%)
Bi (0.96%)
Br (2.34%)
C (0.36%)
Ca (1.18%)
Cd (1.79%)
Ce (1.06%)
Cl (2.75%)
Co (0.8%)
Cr (0.3%)
Cs (1.17%)
Cu (1.62%)
Dy (1.7%)
Er (1.6%)
Eu (0.17%)
F (3.0%)
Fe (0.59%)
Ga (1.98%)
Gd (0.04%)
Ge (0.99%)
H (1.08%)
Hf (0.47%)
Hg (1.9%)
Ho (1.68%)
I (1.81%)
In (1.93%)
Ir (0.86%)
K (1.37%)
La (2.0%)
Li (2.01%)
Lu (0.24%)
Mg (1.27%)
Mn (0.61%)
Mo (0.24%)
N (0.83%)
Na (1.55%)
Nb (0.32%)
Nd (1.9%)
Ni (1.41%)
O (6.11%)
Os (0.3%)
P (0.77%)
Pb (1.2%)
Pd (2.1%)
Pr (1.93%)
Pt (1.41%)
Rb (1.31%)
Re (0.14%)
Rh (1.45%)
Ru (0.64%)
S (3.01%)
Sb (0.88%)
Sc (1.31%)
Se (2.96%)
Si (0.92%)
Sm (1.54%)
Sn (1.39%)
Sr (1.16%)
Ta (0.3%)
Tb (1.77%)
Te (2.19%)
Ti (0.59%)
Tl (2.23%)
Tm (1.61%)
V (0.42%)
W (0.17%)
Y (1.61%)
Yb (0.16%)
Zn (1.79%)
Zr (0.65%)
Methods :
DFT-PBE
Software :
VASP
Publication Link :
Data Source Link :
Configuration Sets by Name :
Configuration Sets by ID :
Name: Alex_MP-20_train
Extended ID: Alex_MP-20_train__Zeni-Pinsler-Zugner-Fowler-Horton-Fu-Wang-Shysheya-Crabbe-Ueda-Sordillo-Sun-Smith-Nguyen-Schulz-Lewis-Huang-Lu-Zhou-Yang-Hao-Li-Yang-Li-Tomioka-Xie__DS_uluw9723f2n4_0
Description: The train split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom.
Authors:
Claudio Zeni
Robert Pinsler
Daniel Zügner
Andrew Fowler
Matthew Horton
Xiang Fu
Zilong Wang
Aliaksandra Shysheya
Jonathan Crabbé
Shoko Ueda
Roberto Sordillo
Lixin Sun
Jake Smith
Bichlien Nguyen
Hannes Schulz
Sarah Lewis
Chin-Wei Huang
Ziheng Lu
Yichi Zhou
Han Yang
Hongxia Hao
Jielan Li
Chunlei Yang
Wenjie Li
Ryota Tomioka
Tian Xie
DOI: 10.60732/8d6afc67
Calculated Property Types:
electronic_band_gap
energy_above_hull
Elements:
Ag (1.84%)
Al (1.41%)
As (0.89%)
Au (1.85%)
B (0.39%)
Ba (1.35%)
Be (0.36%)
Bi (0.96%)
Br (2.34%)
C (0.36%)
Ca (1.18%)
Cd (1.79%)
Ce (1.06%)
Cl (2.75%)
Co (0.8%)
Cr (0.3%)
Cs (1.17%)
Cu (1.62%)
Dy (1.7%)
Er (1.6%)
Eu (0.17%)
F (3.0%)
Fe (0.59%)
Ga (1.98%)
Gd (0.04%)
Ge (0.99%)
H (1.08%)
Hf (0.47%)
Hg (1.9%)
Ho (1.68%)
I (1.81%)
In (1.93%)
Ir (0.86%)
K (1.37%)
La (2.0%)
Li (2.01%)
Lu (0.24%)
Mg (1.27%)
Mn (0.61%)
Mo (0.24%)
N (0.83%)
Na (1.55%)
Nb (0.32%)
Nd (1.9%)
Ni (1.41%)
O (6.11%)
Os (0.3%)
P (0.77%)
Pb (1.2%)
Pd (2.1%)
Pr (1.93%)
Pt (1.41%)
Rb (1.31%)
Re (0.14%)
Rh (1.45%)
Ru (0.64%)
S (3.01%)
Sb (0.88%)
Sc (1.31%)
Se (2.96%)
Si (0.92%)
Sm (1.54%)
Sn (1.39%)
Sr (1.16%)
Ta (0.3%)
Tb (1.77%)
Te (2.19%)
Ti (0.59%)
Tl (2.23%)
Tm (1.61%)
V (0.42%)
W (0.17%)
Y (1.61%)
Yb (0.16%)
Zn (1.79%)
Zr (0.65%)
Methods:
DFT-PBE
Software:
VASP
Number of Configurations: 540,162
Number of Atoms: 5,184,565
Publication Link: https://doi.org/10.1038/s41586-025-08628-5
Data Source Link: https://github.com/microsoft/mattergen
No uploaded content is transferred in ownership from the original creators to ColabFit. All content is distributed under the license specified by its contributor who has stated that he or she has the authority to share it under the specified license.