The ColabFit Exchange: Data for Advanced Materials Science

Welcome to the ColabFit Exchange! This is an online resource for the discovery, exploration and submission of datasets for data-driven interatomic potential (DDIP) development for materials science and chemistry applications. ColabFit's goal is to increase the Findability, Accessibility, Interoperability, and Reusability (FAIR) of DDIP data by providing convenient access to well-curated and standardized first-principles and experimental datasets. Content on the ColabFit Exchange is open source and freely available.

Datasets
 
Configuration Sets
 

Property Objects
 
Configurations
 

 
See Upload Queue

Filter Datasets By:



Min
Max

Min
Max
Min
Max


Reset

Results: 388

23-Single-Element-DNPs_RSCDD_2023-Ag
Dataset Downloads Coming Soon Description: Configurations of Ag from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Ag__Andolina-Saidi__DS_q4h7q8q0fnve_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Ag
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Ag
Number of Configurations: 3,795
Number of Elements: 1
Number of Atoms: 104,827

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Al
Dataset Downloads Coming Soon Description: Configurations of Al from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Al__Andolina-Saidi__DS_8y775we7um7w_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Al
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Al
Number of Configurations: 2,572
Number of Elements: 1
Number of Atoms: 88,139

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Au
Dataset Downloads Coming Soon Description: Configurations of Au from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Au__Andolina-Saidi__DS_iie3c31ar46x_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Au
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Au
Number of Configurations: 3,601
Number of Elements: 1
Number of Atoms: 89,366

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Co
Dataset Downloads Coming Soon Description: Configurations of Co from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Co__Andolina-Saidi__DS_jt0lax9yd15r_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Co
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Co
Number of Configurations: 3,356
Number of Elements: 1
Number of Atoms: 67,320

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Cu
Dataset Downloads Coming Soon Description: Configurations of Cu from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Cu__Andolina-Saidi__DS_dc3o40aou2le_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Cu
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Cu
Number of Configurations: 3,366
Number of Elements: 1
Number of Atoms: 96,568

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Ge
Dataset Downloads Coming Soon Description: Configurations of Ge from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Ge__Andolina-Saidi__DS_90fnsmavv1am_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Ge
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Ge
Number of Configurations: 2,895
Number of Elements: 1
Number of Atoms: 195,270

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-I
Dataset Downloads Coming Soon Description: Configurations of I from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-I__Andolina-Saidi__DS_gc1y80tpyylb_0
Name: 23-Single-Element-DNPs_RSCDD_2023-I
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: I
Number of Configurations: 4,532
Number of Elements: 1
Number of Atoms: 115,902

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Kr
Dataset Downloads Coming Soon Description: Configurations of Kr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Kr__Andolina-Saidi__DS_omnl1yy49sdh_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Kr
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Kr
Number of Configurations: 2,975
Number of Elements: 1
Number of Atoms: 97,920

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Li
Dataset Downloads Coming Soon Description: Configurations of Li from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Li__Andolina-Saidi__DS_wyo91w20wlgm_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Li
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Li
Number of Configurations: 2,536
Number of Elements: 1
Number of Atoms: 93,724

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Mg
Dataset Downloads Coming Soon Description: Configurations of Mg from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Mg__Andolina-Saidi__DS_mevqyitwxukc_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Mg
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Mg
Number of Configurations: 3,004
Number of Elements: 1
Number of Atoms: 58,567

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Mo
Dataset Downloads Coming Soon Description: Configurations of Mo from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Mo__Andolina-Saidi__DS_bjkftn0ug9r3_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Mo
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Mo
Number of Configurations: 3,718
Number of Elements: 1
Number of Atoms: 66,612

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Nb
Dataset Downloads Coming Soon Description: Configurations of Nb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Nb__Andolina-Saidi__DS_zbxayq0diq6l_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Nb
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Nb
Number of Configurations: 3,246
Number of Elements: 1
Number of Atoms: 56,191

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Ni
Dataset Downloads Coming Soon Description: Configurations of Ni from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Ni__Andolina-Saidi__DS_nue14kckbkdh_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Ni
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Ni
Number of Configurations: 3,817
Number of Elements: 1
Number of Atoms: 75,534

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Os
Dataset Downloads Coming Soon Description: Configurations of Os from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Os__Andolina-Saidi__DS_098x6q7kbeat_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Os
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Os
Number of Configurations: 4,779
Number of Elements: 1
Number of Atoms: 117,968

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Pb
Dataset Downloads Coming Soon Description: Configurations of Pb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Pb__Andolina-Saidi__DS_k065jfggbq43_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Pb
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Pb
Number of Configurations: 5,350
Number of Elements: 1
Number of Atoms: 119,252

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Pd
Dataset Downloads Coming Soon Description: Configurations of Pd from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Pd__Andolina-Saidi__DS_g0sb0h7usqw7_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Pd
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Pd
Number of Configurations: 3,478
Number of Elements: 1
Number of Atoms: 140,196

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Pt
Dataset Downloads Coming Soon Description: Configurations of Pt from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Pt__Andolina-Saidi__DS_0zgz34a90a6i_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Pt
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Pt
Number of Configurations: 2,609
Number of Elements: 1
Number of Atoms: 62,152

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Re
Dataset Downloads Coming Soon Description: Configurations of Re from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Re__Andolina-Saidi__DS_gzbvdicu231p_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Re
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Re
Number of Configurations: 5,029
Number of Elements: 1
Number of Atoms: 101,248

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Sb
Dataset Downloads Coming Soon Description: Configurations of Sb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Sb__Andolina-Saidi__DS_z90lfjg88fzo_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Sb
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Sb
Number of Configurations: 5,529
Number of Elements: 1
Number of Atoms: 122,289

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Sr
Dataset Downloads Coming Soon Description: Configurations of Sr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Sr__Andolina-Saidi__DS_o3itca7mk80r_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Sr
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Sr
Number of Configurations: 3,155
Number of Elements: 1
Number of Atoms: 49,426

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Ti
Dataset Downloads Coming Soon Description: Configurations of Ti from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Ti__Andolina-Saidi__DS_ofpcyxez6xsc_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Ti
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Ti
Number of Configurations: 5,665
Number of Elements: 1
Number of Atoms: 153,659

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Zn
Dataset Downloads Coming Soon Description: Configurations of Zn from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Zn__Andolina-Saidi__DS_xsad0btc0bsn_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Zn
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Zn
Number of Configurations: 4,052
Number of Elements: 1
Number of Atoms: 107,039

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_RSCDD_2023-Zr
Dataset Downloads Coming Soon Description: Configurations of Zr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.

ColabFit ID: 23-Single-Element-DNPs_RSCDD_2023-Zr__Andolina-Saidi__DS_h9w8v7289utq_0
Name: 23-Single-Element-DNPs_RSCDD_2023-Zr
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Zr
Number of Configurations: 4,730
Number of Elements: 1
Number of Atoms: 81,165

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
23-Single-Element-DNPs_all_trajectories
Dataset Downloads Coming Soon Description: The full trajectories from the VASP runs used to generate the 23-Single-Element-DNPs training sets. Configuration sets are available for each element.

ColabFit ID: 23-Single-Element-DNPs_all_trajectories__Andolina-Saidi__DS_whwcq2rzt2of_0
Name: 23-Single-Element-DNPs_all_trajectories
Authors: Christopher M. Andolina, Wissam A. Saidi
Elements: Ag, Al, Au, Co, Cu, Ge, I, Kr, Li, Mg, Mo, Nb, Ni, Os, Pb, Pd, Pt, Re, Sb, Sr, Ti, Zn, Zr
Number of Configurations: 108,644
Number of Elements: 23
Number of Atoms: 2,352,424

Links:
https://github.com/saidigroup/23-Single-Element-DNPs
https://doi.org/10.1039/D3DD00046J
3BPA_isolated_atoms
Dataset Downloads Coming Soon Description: Reference C, H, O, and N atoms from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_isolated_atoms__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_gaq0fec8i6ik_0
Name: 3BPA_isolated_atoms
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 4
Number of Elements: 4
Number of Atoms: 4

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_test_1200K
Dataset Downloads Coming Soon Description: Test configurations with MD simulations performed at 1200K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_test_1200K__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_xaasqmrdv28s_0
Name: 3BPA_test_1200K
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 2,139
Number of Elements: 4
Number of Atoms: 57,753

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_test_300K
Dataset Downloads Coming Soon Description: Test configurations with MD simulations performed at 300K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_test_300K__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_i4hwummq4c7j_0
Name: 3BPA_test_300K
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 1,669
Number of Elements: 4
Number of Atoms: 45,063

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_test_600K
Dataset Downloads Coming Soon Description: Test configurations with MD simulations performed at 600K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_test_600K__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_epqxcavda3kt_0
Name: 3BPA_test_600K
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 2,138
Number of Elements: 4
Number of Atoms: 57,726

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_test_dih_beta120
Dataset Downloads Coming Soon Description: Test configurations with fixed value for dihedral beta in alpha-gamma plane of 120 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_test_dih_beta120__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_8cg3pdvxt0pa_0
Name: 3BPA_test_dih_beta120
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 2,347
Number of Elements: 4
Number of Atoms: 63,369

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_test_dih_beta150
Dataset Downloads Coming Soon Description: Test configurations with fixed value for dihedral beta in alpha-gamma plane of 150 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_test_dih_beta150__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_199ama4h9t7m_0
Name: 3BPA_test_dih_beta150
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 2,350
Number of Elements: 4
Number of Atoms: 63,450

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_test_dih_beta180
Dataset Downloads Coming Soon Description: Test configurations with fixed value for dihedral beta in alpha-gamma plane of 180 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_test_dih_beta180__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_ilijykl0t0fn_0
Name: 3BPA_test_dih_beta180
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 2,350
Number of Elements: 4
Number of Atoms: 63,450

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_train_300K
Dataset Downloads Coming Soon Description: Training configurations with MD simulations performed at 300K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_train_300K__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_hu0btdblv8x6_0
Name: 3BPA_train_300K
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 500
Number of Elements: 4
Number of Atoms: 13,500

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
3BPA_train_mixed
Dataset Downloads Coming Soon Description: Training configurations with MD simulation performed at 300K, 600K and 1200K from 3BPA dataset, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.

ColabFit ID: 3BPA_train_mixed__Kovács-Oord-Kucera-Allen-Cole-Ortner-Csányi__DS_v88913tvdxbr_0
Name: 3BPA_train_mixed
Authors: Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
Elements: C, H, N, O
Number of Configurations: 500
Number of Elements: 4
Number of Atoms: 13,500

Links:
https://doi.org/10.1021/acs.jctc.1c00647
https://doi.org/10.1021/acs.jctc.1c00647
ABC2D6-16_PRL_2018
Dataset Downloads Coming Soon Description: Dataset used to train a machine learning model to calculate density functional theory-quality formation energies of all ~2 x 106 pristine ABC2D6 elpasolite crystals that can be made up from main-group elements (up to bismuth).

ColabFit ID: ABC2D6-16_PRL_2018__Faber-Lindmaa-Lilienfeld-Armiento__DS_0ady7a8a8n6p_0
Name: ABC2D6-16_PRL_2018
Authors: Felix Faber, Alexander Lindmaa, O. Anatole von Lilienfeld, Rickard Armiento
Elements: Al, Ar, As, B, Ba, Be, Bi, Br, C, Ca, Cl, Cs, F, Ga, Ge, H, He, I, In, K, Kr, Li, Mg, N, Na, Ne, O, P, Pb, Rb, S, Sb, Se, Si, Sn, Sr, Te, Tl, Xe
Number of Configurations: 21,882
Number of Elements: 39
Number of Atoms: 218,820

Links:
https://qmml.org/datasets.html
https://doi.org/10.1103/PhysRevLett.117.135502
AENET_amorphous_LiSi_JCP2021
Dataset Downloads Coming Soon Description: The amorphous LiSi data set comprises 45,169 atomic structures with compositions Li(x)Si (0.0≤x≤4.75) and the corresponding energies and interatomic forces, which were generated using an iterative approach based on an evolutionary algorithm and subsequent refinement, as described in detail in reference [15]. The data includes bulk, surface, and cluster structures with system sizes of up to 608 atoms. The energies and forces of the LiSi structures were obtained from DFT calculations using the Perdew-Burke-Ernzerhof [10] exchange-correlation functional and projector-augmented wave pseudopotentials [16], as implemented in the Vienna Ab-Initio Simulation Package (VASP) [17,18]. We employed a plane-wave basis set with an energy cutoff of 520 eV for the representation of the wavefunctions and a uniform gamma-centered k-point grid for the Brillouin zone integration, with a mesh density corresponding to a number of k points of at least 1000 divided by the number of atoms. The atomic positions and lattice parameters of all structures were optimized until residual forces were below 20 meV/Å. This dataset was also used for the construction of the ANN potential in Ref. [15] and [19]. [10] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [15] N. Artrith, A. Urban, G. Ceder, J. Chem. Phys. 148 (2018) 241711. [16] P. E. Blöchl, Phys. Rev. B 50, 17953–17979 (1994). [17] G. Kresse, J. Furthmüller, Phys. Rev. B 54, 11169–11186 (1996). [18] Kresse, J. Furthmüller, Comput. Mater. Sci. 6, 15–50 (1996). [19] N. Artrith, A. Urban, Y. Wang, G. Ceder, arXiv:1901.09272, https://arxiv.org/pdf/1901.09272.pdf

ColabFit ID: AENET_amorphous_LiSi_JCP2021__Chen-Morawietz-Markland-Artrith__DS_0nnbymlcjota_0
Name: AENET_amorphous_LiSi_JCP2021
Authors: Michael S. Chen, Tobias Morawietz, Thomas E. Markland, Nongnuch Artrith
Elements: Li, Si
Number of Configurations: 44,652
Number of Elements: 2
Number of Atoms: 5,741,142

Links:
https://doi.org/10.24435/materialscloud:dx-ct
http://doi.org/10.1063/5.0063880
AENET_liquid_water_dataset_JCP2021
Dataset Downloads Coming Soon Description: The water data set comprises energies and forces of 9,189 condensed-phase structures. The data was obtained in an iterative procedure described in detail in Ref. [4]. The final ANN potential was employed in Refs. [4,5] to analyze temperature-dependent Raman spectra of liquid water. The data set contains structures from four iterations: Initial structures (iteration 0) were obtained from classical and path integral AIMD simulations of bulk liquid water in a cubic box containing 64 water molecules at 300 K as reported in Ref. [6]. Distorted configurations with higher forces were added by randomly displacing the Cartesian coordinates of these configurations. Iteration 1 contains a set of 500 configurations from MD simulations with the fully flexible SPC/E flex water model [7] employing a 25 % increased water density (simulation box with 80 water molecules) and elevated temperatures (T = 500 K) in order to sample highly repulsive configurations. Structures in iteration 2 were obtained by classical MD simulations with preliminary ANN potentials at T = 300 K, 325 K, 350 K, and 370 K employing cubic boxes with 64 molecules and the corresponding experimental densities. The final iteration 3 data contains structures from preliminary ANN simulations with classical and quantum nuclei, respectively, at a wide range of temperatures (T = 258 K, 268 K, 280 K, 290 K, 300 K, 310 K, 320 K, 330 K, 340 K, 350 K, 360 K, and 370 K) using cubic boxes with 64 molecules and the corresponding experimental densities. Energies and atomic forces were calculated with the CP2K program [8,9] using the revPBE exchange-correlation functional [10,11] with D3 dispersion correction [12] following the setup reported in Ref. [4]. Atomic cores were represented using the dual-space Goedecker-Teter-Hutter pseudopotentials [13], Kohn-Sham orbitals were expanded in the TZV2P basis set within the GPW method [14], and the density was represented by an auxiliary plane-wave basis with a cutoff of 400 Ry. [1] A. Kokalj, J. Mol. Graphics Modell. 17, 176–179 (1999). [2] N. Artrith, A. Urban, Comput. Mater. Sci. 114, 135–150 (2016). [3] N. Artrith, A. Urban, G. Ceder, Phys. Rev. B 96, 014112 (2017). [4] T. Morawietz, O. Marsalek, S. R. Pattenaude, L. M. Streacker, D. Ben-Amotz, and T. E. Markland, J. Phys. Chem. Lett. 9, 851 (2018). [5] T. Morawietz, A. S. Urbina, P. K. Wise, X. Wu, W. Lu, D. Ben-Amotz, and T. E. Markland, J. Phys. Chem. Lett. 10, 6067 (2019). [6] Marsalek and T. E. Markland, J. Phys. Chem. Lett. 8, 1545 (2017). [7] X. B. Zhang, Q. L. Liu, and A. M. Zhu, Fluid Ph. Equilibria 262, 210(2007). [8] J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, T. Chassaing, and J. Hutter, Comput. Phys. Commun. 167, 103 (2005). [9] J. Hutter, M. Iannuzzi, F. Schiffmann, and J. VandeVondele, WIRES Comput. Mol. Sci. 4, 15 (2014). [10] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [11] Y. Zhang and W. Yang, Phys. Rev. Lett. 80, 890 (1998). [12] S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, J. Chem. Phys. 132, 154104 (2010). [13] S. Goedecker, M. Teter, and J. Hutter, Phys. Rev. B 54, 1703 (1996). [14] B. G. Lippert, J. Hutter, and M. Parrinello, Mol. Phys. 92, 477 (1997).

ColabFit ID: AENET_liquid_water_dataset_JCP2021__Chen-Morawietz-Markland-Artrith__DS_v4bk24pq0had_0
Name: AENET_liquid_water_dataset_JCP2021
Authors: Michael S. Chen, Tobias Morawietz, Thomas E. Markland, Nongnuch Artrith
Elements: H, O
Number of Configurations: 9,189
Number of Elements: 2
Number of Atoms: 1,788,288

Links:
https://doi.org/10.24435/materialscloud:dx-ct
http://doi.org/10.1063/5.0063880
AFF_JCP_2022
Dataset Downloads Coming Soon Description: Approximately 145,000 configurations of alkane, aspirin, alpha-glucose and uracil, partly taken from the MD-17 dataset, used in training an 'Atomic Neural Net' model.

ColabFit ID: AFF_JCP_2022__Li-Zhou-Sebastian-Wu-Gu__DS_5lhmgnxhuia3_0
Name: AFF_JCP_2022
Authors: Hao Li, Musen Zhou, Jessalyn Sebastian, Jianzhong Wu, Mengyang Gu
Elements: C, H, N, O
Number of Configurations: 143,770
Number of Elements: 4
Number of Atoms: 1,911,240

Links:
https://github.com/UncertaintyQuantification/AFF/tree/master
https://doi.org/10.1063/5.0088017
ANI-1
Dataset Downloads Coming Soon Description: ANI-1 is a dataset of 20 million conformations with calculated non-equilibrium energy values. The conformations are based on a subset of the GDB-11 dataset, each molecule containing between 1 and 8 heavy atoms, with atomic species limited to C, N and O. Configuration sets are included for standard and high energy (defined as energies greater than 275 kcal*mol-1 higher than the lowest energy conformer) conformations, and, within these, number of heavy atoms per molecule.

ColabFit ID: ANI-1__Smith-Isayev-Roitberg__DS_p4evspy1ntcs_0
Name: ANI-1
Authors: Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg
Elements: C, H, N, O
Number of Configurations: 24,416,306
Number of Elements: 4
Number of Atoms: 392,606,016

Links:
https://doi.org/10.6084/m9.figshare.c.3846712.v1
https://doi.org/10.1038/sdata.2017.193
ANI-1x
Dataset Downloads Coming Soon Description: ANI-1x contains DFT calculations for approximately 5 million molecular conformations. From an initial training set, an active learning method was used to iteratively add conformations where insufficient diversity was detected. Additional conformations were sampled from existing databases of molecules, such as GDB-11 and ChEMBL. On each of these configurations, one of molecular dynamics sampling, normal mode sampling, dimer sampling, or torsion sampling was performed.

ColabFit ID: ANI-1x__Smith-Zubatyuk-Nebgen-Lubbers-Barros-Roitberg-Isayev-Tretiak__DS_ko3rpzre7bea_0
Name: ANI-1x
Authors: Justin S. Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E. Roitberg, Olexandr Isayev, Sergei Tretiak
Elements: C, H, N, O
Number of Configurations: 4,956,005
Number of Elements: 4
Number of Atoms: 75,700,481

Links:
https://doi.org/10.6084/m9.figshare.c.4712477.v1
https://doi.org/10.1038/s41597-020-0473-z
ANI-2x-B973c-def2mTZVP
Dataset Downloads Coming Soon Description: ANI-2x-B973c-def2mTZVP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the B973c level of theory using the def2m-TZVP basis set. Configuration sets are divided by number of atoms per structure. Force corrections and dipoles are recorded in the metadata.

ColabFit ID: ANI-2x-B973c-def2mTZVP__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_gjr8gi1tb8wh_0
Name: ANI-2x-B973c-def2mTZVP
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 9,643,594
Number of Elements: 7
Number of Atoms: 146,656,635

Links:
https://doi.org/10.5281/zenodo.10108942
https://doi.org/10.1021/acs.jctc.0c00121
ANI-2x-wB97MD3BJ-def2TZVPP
Dataset Downloads Coming Soon Description: ANI-2x-wB97MD3BJ-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the wB97M level of theory with D3 and BJ energy corrections, using the def2-TZVPP basis set. Configuration sets are divided by number of atoms per structure. Uncorrected SCF energy values and dipoles are recorded in the metadata.

ColabFit ID: ANI-2x-wB97MD3BJ-def2TZVPP__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_yxeu6us2i6dh_0
Name: ANI-2x-wB97MD3BJ-def2TZVPP
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 9,650,572
Number of Elements: 7
Number of Atoms: 146,715,621

Links:
https://doi.org/10.5281/zenodo.10108942
https://doi.org/10.1021/acs.jctc.0c00121
ANI-2x-wB97MV-def2TZVPP
Dataset Downloads Coming Soon Description: ANI-2x-wB97MV-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated at the WB97MV level of theory using the def2TZVPP basis set. Configuration sets are divided by number of atoms per structure.

ColabFit ID: ANI-2x-wB97MV-def2TZVPP__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_wxf8f3t3abul_0
Name: ANI-2x-wB97MV-def2TZVPP
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 9,650,572
Number of Elements: 7
Number of Atoms: 146,715,621

Links:
https://doi.org/10.5281/zenodo.10108942
https://doi.org/10.1021/acs.jctc.0c00121
ANI-2x-wB97X-631Gd
Dataset Downloads Coming Soon Description: ANI-2x-wB97X-631Gd is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in Gaussian 09 at the wB97X level of theory using the 6-31G(d) basis set. Configuration sets are divided by number of atoms per structure.

ColabFit ID: ANI-2x-wB97X-631Gd__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_rde4chkdmm18_0
Name: ANI-2x-wB97X-631Gd
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 9,651,712
Number of Elements: 7
Number of Atoms: 146,736,809

Links:
https://doi.org/10.5281/zenodo.10108942
https://doi.org/10.1021/acs.jctc.0c00121
ANI-Al_NC2021-test
Dataset Downloads Coming Soon Description: Approximately 2800 configurations from a test dataset–one of a pair of train/test datasets of aluminum in crystal and melt phases, used for training and testing an ANI neural network model.

ColabFit ID: ANI-Al_NC2021-test__Smith-Nebgen-Mathew-Chen-Lubbers-Burakovsky-Tretiak-Nam-Germann-Fensin-Barros__DS_bnnyjpaqb09i_0
Name: ANI-Al_NC2021-test
Authors: Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, Kipton Barros
Elements: Al
Number of Configurations: 2,872
Number of Elements: 1
Number of Atoms: 371,645

Links:
https://github.com/atomistic-ml/ani-al
https://doi.org/10.1038/s41467-021-21376-0
ANI-Al_NC2021-train
Dataset Downloads Coming Soon Description: Approximately 2800 configurations from a train dataset–one of a pair of train/test datasets of aluminum in crystal and melt phases, used for training and testing an ANI neural network model.

ColabFit ID: ANI-Al_NC2021-train__Smith-Nebgen-Mathew-Chen-Lubbers-Burakovsky-Tretiak-Nam-Germann-Fensin-Barros__DS_tz8gsj3xi2g4_0
Name: ANI-Al_NC2021-train
Authors: Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, Kipton Barros
Elements: Al
Number of Configurations: 2,864
Number of Elements: 1
Number of Atoms: 375,121

Links:
https://github.com/atomistic-ml/ani-al
https://doi.org/10.1038/s41467-021-21376-0
Ag-PBE_MSMSE_2021
Dataset Downloads Coming Soon Description: Approximately 7,600 configurations of Ag used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential.

ColabFit ID: Ag-PBE_MSMSE_2021__Wang-Wang-Zhang-Xu-Wang__DS_6e5ljaqa3cfr_0
Name: Ag-PBE_MSMSE_2021
Authors: Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang
Elements: Ag
Number of Configurations: 7,608
Number of Elements: 1
Number of Atoms: 152,318

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=Ag-PBE
https://doi.org/10.48550/arXiv.2108.06232
AgAu-nanoalloy_MSMSE_2021
Dataset Downloads Coming Soon Description: Approximately 50,000 configurations of Au, Ag and AuAg used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential.

ColabFit ID: AgAu-nanoalloy_MSMSE_2021__Wang-Wang-Zhang-Xu-Wang__DS_t0ec1irmeo59_0
Name: AgAu-nanoalloy_MSMSE_2021
Authors: Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang
Elements: Ag, Au
Number of Configurations: 51,771
Number of Elements: 2
Number of Atoms: 1,188,220

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=AgAu-nanoalloy
https://doi.org/10.48550/arXiv.2108.06232
AgPd_NPJ_2021
Dataset Downloads Coming Soon Description: The dataset consists of energies, forces and virials for DFT-VASP-generated Ag-Pd systems. The data was used to fit an active learned dataset which was used to compare MTP- and SOAP-GAP-generated potentials.

ColabFit ID: AgPd_NPJ_2021__Rosenbrock-Gubaev-Shapeev-Pártay-Bernstein-Csányi-Hart__DS_y5m3hunroa7x_0
Name: AgPd_NPJ_2021
Authors: Conrad W. Rosenbrock, Konstantin Gubaev, Alexander V. Shapeev, Livia B. Pártay, Noam Bernstein, Gábor Csányi, Gus L. W. Hart
Elements: Ag, Pd
Number of Configurations: 1,691
Number of Elements: 2
Number of Atoms: 14,180

Links:
https://github.com/msg-byu/agpd
https://doi.org/10.1038/s41524-020-00477-2
AlNiCu_AIP_2020
Dataset Downloads Coming Soon Description: This dataset is formed from two parts: single-species datasets for Al, Ni, and Cu from the NOMAD Encyclopedia and multi-species datasets that include Al, Ni and Cu from NOMAD Archive. Duplicates have been removed from NOMAD Encyclopedia data. For the multi-species data, only the last configuration steps for each NOMAD Archive record were used because the last configuration typically cooresponds with a fully relaxed configuration. In this dataset, the NOMAD unique reference access IDs are retained along with a subset of their meta information that includes whether the supplied configuration is from a converged calculation as well as the Density Functional Theory (DFT) code, version, and type of DFT functionals with the total potential energies. This dataset consists of 39.1% Al, 30.7% Ni, and 30.2% Cu and has 27,987 atomic environments in 3337 structures.

ColabFit ID: AlNiCu_AIP_2020__Onat-Ortner-Kermode__DS_vmnudsz7kx0a_0
Name: AlNiCu_AIP_2020
Authors: Berk Onat, Christoph Ortner, James R. Kermode
Elements: Al, Cu, Ni
Number of Configurations: 1,017
Number of Elements: 3
Number of Atoms: 4,650

Links:
https://github.com/DescriptorZoo/sensitivity-dimensionality-results
https://doi.org/10.1063/5.0016005
AlNiTi_CMS_2019
Dataset Downloads Coming Soon Description: This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included about 375,000 binary and ternary structures, enumerating all possible unit cells with different symmetries (BCC, FCC, and HCP) and different number of atoms.

ColabFit ID: AlNiTi_CMS_2019__Gubaev-Podryabinkin-Hart-Shapeev__DS_dtjyh96dypuu_0
Name: AlNiTi_CMS_2019
Authors: Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev
Elements: Al, Ni, Ti
Number of Configurations: 2,684
Number of Elements: 3
Number of Atoms: 25,067

Links:
https://gitlab.com/kgubaev/accelerating-high-throughput-searches-for-new-alloys-with-active-learning-data
https://doi.org/10.1016/j.commatsci.2018.09.031
Al_Cu_Mg_GSFE_JMPS2019
Dataset Downloads Coming Soon Description: Dataset from "Stress-dependence of generalized stacking fault energies":DFT calculations of generalized stacking fault energies (GSFE) for Al, Cu, and Mg.

ColabFit ID: Al_Cu_Mg_GSFE_JMPS2019__Yin-Andric-Curtin__DS_styimm9cal6v_0
Name: Al_Cu_Mg_GSFE_JMPS2019
Authors: Binglun Yin, Predrag Andric, W. A. Curtin
Elements: Al, Cu, Mg
Number of Configurations: 273
Number of Elements: 3
Number of Atoms: 3,264

Links:
https://doi.org/10.24435/materialscloud:2019.0089/v1
https://doi.org/10.1016/j.jmps.2018.09.007
Au-PBE_MSMSE_2021
Dataset Downloads Coming Soon Description: Approximately 20,000 configurations of Au used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential.

ColabFit ID: Au-PBE_MSMSE_2021__Wang-Wang-Zhang-Xu-Wang__DS_tydu7u0h0hhz_0
Name: Au-PBE_MSMSE_2021
Authors: Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang
Elements: Au
Number of Configurations: 19,434
Number of Elements: 1
Number of Atoms: 310,792

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=Au-PBE
https://doi.org/10.48550/arXiv.2108.06232
BA10-18
Dataset Downloads Coming Soon Description: Dataset (DFT-10B) contains structures of the 10 binary alloys AgCu, AlFe, AlMg, AlNi, AlTi, CoNi, CuFe, CuNi, FeV, and NbNi. Each alloy system includes all possible unit cells with 1-8 atoms for face-centered cubic (fcc) and body-centered cubic (bcc) crystal types, and all possible unit cells with 2-8 atoms for the hexagonal close-packed (hcp) crystal type. This results in 631 fcc, 631 bcc, and 333 hcp structures, yielding 1595 x 10 = 15,950 unrelaxed structures in total. Lattice parameters for each crystal structure were set according to Vegard's law. Total energies were computed using DFT with projector-augmented wave (PAW) potentials within the generalized gradient approximation (GGA) of Perdew, Burke, and Ernzerhof (PBE) as implemented in the Vienna Ab Initio Simulation Package (VASP). The k-point meshes for sampling the Brillouin zone were constructed using generalized regular grids.

ColabFit ID: BA10-18__Nyshadham-Rupp-Bekker-Shapeev-Mueller-Rosenbrock-Csányi-Wingate-Hart__DS_lifzo8zpa76m_0
Name: BA10-18
Authors: Chandramouli Nyshadham, Matthias Rupp, Brayden Bekker, Alexander V. Shapeev, Tim Mueller, Conrad W. Rosenbrock, Gábor Csányi, David W. Wingate, Gus L. W. Hart
Elements: Ag, Al, Co, Cu, Fe, Mg, Nb, Ni, Ti, V
Number of Configurations: 15,920
Number of Elements: 10
Number of Atoms: 116,380

Links:
https://qmml.org/datasets.html
https://doi.org/10.1038/s41524-019-0189-9
BOTnet_ACAC_2022_Dihedral_scan
Dataset Downloads Coming Soon Description: Dihedral scan about one of the C-C bonds of the conjugated system. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_Dihedral_scan__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_0xfqsnoawhoo_0
Name: BOTnet_ACAC_2022_Dihedral_scan
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 45
Number of Elements: 3
Number of Atoms: 675

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
BOTnet_ACAC_2022_H_transfer
Dataset Downloads Coming Soon Description: NEB path of proton transfer reaction between the two forms of acetylacetone. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_H_transfer__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_ztlvjmuwskom_0
Name: BOTnet_ACAC_2022_H_transfer
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 15
Number of Elements: 3
Number of Atoms: 225

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
BOTnet_ACAC_2022_isolated
Dataset Downloads Coming Soon Description: Energies of the isolated atoms evalauted at the reference DFT settings. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_isolated__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_e22u3j5sbilx_0
Name: BOTnet_ACAC_2022_isolated
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 3
Number of Elements: 3
Number of Atoms: 3

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
BOTnet_ACAC_2022_test_300K_MD
Dataset Downloads Coming Soon Description: Test set of decorrelated geometries sampled from 300 K xTB MD. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_test_300K_MD__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_gt0oa9fngod7_0
Name: BOTnet_ACAC_2022_test_300K_MD
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 650
Number of Elements: 3
Number of Atoms: 9,750

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
BOTnet_ACAC_2022_test_600K_MD
Dataset Downloads Coming Soon Description: Test set of decorrelated geometries sampled from 600 K xTB MD. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_test_600K_MD__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_k0rexhnv58vn_0
Name: BOTnet_ACAC_2022_test_600K_MD
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 650
Number of Elements: 3
Number of Atoms: 9,750

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
BOTnet_ACAC_2022_train_300K_MD
Dataset Downloads Coming Soon Description: 500 decorrelated geometries sampled from 300 K xTB MD run. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_train_300K_MD__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_8gqwz5bw1pvq_0
Name: BOTnet_ACAC_2022_train_300K_MD
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 500
Number of Elements: 3
Number of Atoms: 7,500

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
BOTnet_ACAC_2022_train_600K_MD
Dataset Downloads Coming Soon Description: 500 decorrelated geometries sampled from 600 K xTB MD run. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.

ColabFit ID: BOTnet_ACAC_2022_train_600K_MD__Batatia-Batzner-Kovács-Musaelian-Simm-Drautz-Ortner-Kozinsky-Csányi__DS_632yputkfhpg_0
Name: BOTnet_ACAC_2022_train_600K_MD
Authors: Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Elements: C, H, O
Number of Configurations: 500
Number of Elements: 3
Number of Atoms: 7,500

Links:
https://github.com/davkovacs/BOTNet-datasets
https://doi.org/10.48550/arXiv.2205.06643
C7H10O2
Dataset Downloads Coming Soon Description: 6095 isomers of C7O2H10. Energetics were calculated at the G4MP2 level of theory.

ColabFit ID: C7H10O2__Ramakrishnan-Dral-Rupp-Lilienfeld__DS_olmzgsz9hmxy_0
Name: C7H10O2
Authors: Raghunathan Ramakrishnan, Pavlo Dral, Matthias Rupp, O. Anatole von Lilienfeld
Elements: C, H, O
Number of Configurations: 6,095
Number of Elements: 3
Number of Atoms: 115,805

Links:
https://doi.org/10.6084/m9.figshare.c.978904.v5
https://doi.org/10.1038/sdata.2014.22
CA-9_BB_training
Dataset Downloads Coming Soon Description: Binning-binning configurations from CA-9 dataset used for training NNP_BB potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_BB_training__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_l7inbtql4ea9_0
Name: CA-9_BB_training
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 20,012
Number of Elements: 1
Number of Atoms: 1,054,055

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_BB_validation
Dataset Downloads Coming Soon Description: Binning-binning configurations from CA-9 dataset used during validation step for NNP_BB potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_BB_validation__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_8zft35i5sbgd_0
Name: CA-9_BB_validation
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 4,003
Number of Elements: 1
Number of Atoms: 233,034

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_BR_training
Dataset Downloads Coming Soon Description: Binning-random configurations from CA-9 dataset used for training NNP_BR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_BR_training__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_fw7m5d8b0fa9_0
Name: CA-9_BR_training
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 20,013
Number of Elements: 1
Number of Atoms: 1,072,779

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_BR_validation
Dataset Downloads Coming Soon Description: Binning-random configurations from CA-9 dataset used during validation step for NNP_BR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_BR_validation__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_fgakfcwhn9zc_0
Name: CA-9_BR_validation
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 4,002
Number of Elements: 1
Number of Atoms: 214,310

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_RR_training
Dataset Downloads Coming Soon Description: Random-random configurations from CA-9 dataset used for training NNP_RR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_RR_training__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_tag5zubl21w8_0
Name: CA-9_RR_training
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 20,013
Number of Elements: 1
Number of Atoms: 1,100,042

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_RR_validation
Dataset Downloads Coming Soon Description: Random-random configurations from CA-9 dataset used during validation step for NNP_RR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_RR_validation__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_mr2qaiqh2sx2_0
Name: CA-9_RR_validation
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 4,002
Number of Elements: 1
Number of Atoms: 218,184

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_test
Dataset Downloads Coming Soon Description: Test configurations from CA-9 dataset used to evaluate trained NNPs.CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_test__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_h0mshvvbxlai_0
Name: CA-9_test
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 2,727
Number of Elements: 1
Number of Atoms: 206,302

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_training
Dataset Downloads Coming Soon Description: Configurations from CA-9 dataset used for training NNP_CA-9 potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_training__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_9cdtww9pcjqd_0
Name: CA-9_training
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 40,000
Number of Elements: 1
Number of Atoms: 2,195,399

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CA-9_validation
Dataset Downloads Coming Soon Description: Configurations from CA-9 dataset used during validation step for NNP_CA-9 potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials

ColabFit ID: CA-9_validation__Hedman-Rothe-Johansson-Sandin-Larsson-Miyamoto__DS_07zjrtacjg3z_0
Name: CA-9_validation
Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Elements: C
Number of Configurations: 8,000
Number of Elements: 1
Number of Atoms: 436,601

Links:
https://doi.org/10.24435/materialscloud:6h-yj
https://doi.org/10.1016/j.cartre.2021.100027
CGM-MLP_natcomm2023_Cr-C_deposition
Dataset Downloads Coming Soon Description: Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cr surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_Cr-C_deposition__Zhang-Yi-Lai-Peng-Li__DS_392q5d027yja_0
Name: CGM-MLP_natcomm2023_Cr-C_deposition
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cr
Number of Configurations: 1,192
Number of Elements: 2
Number of Atoms: 298,114

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_Cu-C-O
Dataset Downloads Coming Soon Description: Training simulations from CGM-MLP_natcomm2023 of carbon on an oxygen-contaminated Cu surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_Cu-C-O__Zhang-Yi-Lai-Peng-Li__DS_xho213jy5pf9_0
Name: CGM-MLP_natcomm2023_Cu-C-O
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu, O
Number of Configurations: 1,717
Number of Elements: 3
Number of Atoms: 387,151

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_Cu-C-O_deposition
Dataset Downloads Coming Soon Description: Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cu surface. This appears similar to CGM-MLP_natcomm2023_CU-C_deposition, as there are no O atoms present in this set. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_Cu-C-O_deposition__Zhang-Yi-Lai-Peng-Li__DS_xzo1tvni8bay_0
Name: CGM-MLP_natcomm2023_Cu-C-O_deposition
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 1,694
Number of Elements: 2
Number of Atoms: 326,328

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_Cu-C_deposition
Dataset Downloads Coming Soon Description: Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cu surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_Cu-C_deposition__Zhang-Yi-Lai-Peng-Li__DS_vgy50b4qz4p7_0
Name: CGM-MLP_natcomm2023_Cu-C_deposition
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 1,177
Number of Elements: 2
Number of Atoms: 204,591

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_Cu-C_metal_surface
Dataset Downloads Coming Soon Description: Training simulations from CGM-MLP_natcomm2023 of carbon on a Cu metal surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_Cu-C_metal_surface__Zhang-Yi-Lai-Peng-Li__DS_mo7q9ajg1lw8_0
Name: CGM-MLP_natcomm2023_Cu-C_metal_surface
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 520
Number of Elements: 2
Number of Atoms: 122,294

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_GAP_20
Dataset Downloads Coming Soon Description: Carbon_GAP_20 dataset from CGM-MLP_natcomm2023. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_GAP_20__Zhang-Yi-Lai-Peng-Li__DS_qslbmhe4bcgg_0
Name: CGM-MLP_natcomm2023_GAP_20
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 6,178
Number of Elements: 2
Number of Atoms: 400,485

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_Ti-C_deposition
Dataset Downloads Coming Soon Description: Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Ti surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.

ColabFit ID: CGM-MLP_natcomm2023_Ti-C_deposition__Zhang-Yi-Lai-Peng-Li__DS_8frq3jczb0dt_0
Name: CGM-MLP_natcomm2023_Ti-C_deposition
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Ti
Number of Configurations: 1,309
Number of Elements: 2
Number of Atoms: 259,636

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_amorphous_carbon_test
Dataset Downloads Coming Soon Description: 493 structures available from the GAP-20 database, excluding any structures present in the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_amorphous_carbon_test__Zhang-Yi-Lai-Peng-Li__DS_wt0m3lxcdy52_0
Name: CGM-MLP_natcomm2023_screening_amorphous_carbon_test
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C
Number of Configurations: 494
Number of Elements: 1
Number of Atoms: 32,279

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_amorphous_carbon_train
Dataset Downloads Coming Soon Description: 2558 structures selected from the GAP-20 database. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_amorphous_carbon_train__Zhang-Yi-Lai-Peng-Li__DS_taxwl7jo4f8a_0
Name: CGM-MLP_natcomm2023_screening_amorphous_carbon_train
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C
Number of Configurations: 2,559
Number of Elements: 1
Number of Atoms: 168,191

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_carbon-cluster@Cu_test
Dataset Downloads Coming Soon Description: 192 structures were uniformly selected from the AIMD simulation, excluding any structures that are part of the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_carbon-cluster@Cu_test__Zhang-Yi-Lai-Peng-Li__DS_gkumjkncy8ft_0
Name: CGM-MLP_natcomm2023_screening_carbon-cluster@Cu_test
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 193
Number of Elements: 2
Number of Atoms: 38,004

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_carbon-cluster@Cu_train
Dataset Downloads Coming Soon Description: 588 structures selected from the AIMD simulation of the Cu(111) slab, including both the C1-C18 clusters on the Cu(111) slab. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_carbon-cluster@Cu_train__Zhang-Yi-Lai-Peng-Li__DS_6sjhg1f8fv5j_0
Name: CGM-MLP_natcomm2023_screening_carbon-cluster@Cu_train
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 588
Number of Elements: 2
Number of Atoms: 115,460

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_deposited-carbon@Cu_test
Dataset Downloads Coming Soon Description: 468 structures uniformly selected from the MD/tfMC simulation, excluding any structures that are part of the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_deposited-carbon@Cu_test__Zhang-Yi-Lai-Peng-Li__DS_5r9dvov8nq7c_0
Name: CGM-MLP_natcomm2023_screening_deposited-carbon@Cu_test
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 469
Number of Elements: 2
Number of Atoms: 156,312

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_deposited-carbon@Cu_train
Dataset Downloads Coming Soon Description: 1090 structures uniformly selected from the MD/tfMC simulation during the training process of CGM-MLPs. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_deposited-carbon@Cu_train__Zhang-Yi-Lai-Peng-Li__DS_j6w4ru800ukq_0
Name: CGM-MLP_natcomm2023_screening_deposited-carbon@Cu_train
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C, Cu
Number of Configurations: 1,091
Number of Elements: 2
Number of Atoms: 362,898

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CGM-MLP_natcomm2023_screening_graphite_train
Dataset Downloads Coming Soon Description: 40 graphite structures with different lattice constants ranging from 2.0 to 3.2 Å, with a 0.03 Å increment. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.

ColabFit ID: CGM-MLP_natcomm2023_screening_graphite_train__Zhang-Yi-Lai-Peng-Li__DS_jasbxoigo7r4_0
Name: CGM-MLP_natcomm2023_screening_graphite_train
Authors: Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
Elements: C
Number of Configurations: 41
Number of Elements: 1
Number of Atoms: 1,968

Links:
https://github.com/sjtudizhang/CGM-MLP
https://doi.org/10.1038/s41467-023-44525-z
CHON_JCP_2020
Dataset Downloads Coming Soon Description: This dataset of molecular structures was extracted, using the NOMAD API, from all available structures in the NOMAD Archive that only include C, H, O, and N. This dataset consists of 50.42% H, 30.41% C, 10.36% N, and 8.81% O and includes 96 804 atomic environments in 5217 structures.

ColabFit ID: CHON_JCP_2020__Onat-Ortner-Kermode__DS_3td9plyix4ib_0
Name: CHON_JCP_2020
Authors: Berk Onat, Christoph Ortner, James R. Kermode
Elements: C, H, N, O
Number of Configurations: 5,216
Number of Elements: 4
Number of Atoms: 96,736

Links:
https://github.com/DescriptorZoo/sensitivity-dimensionality-results/tree/master/datasets
https://doi.org/10.1063/5.0016005
COHInPt_schaaf_2023
Dataset Downloads Coming Soon Description: Training and simulation data from machine learning force field model applied to steps of the hydrogenation of carbon dioxide to methanol over an indium oxide catalyst, with and without platinum doping.

ColabFit ID: COHInPt_schaaf_2023__Schaaf-Fako-De-Schafer-Csanyi__DS_l9f0rjjqfd67_0
Name: COHInPt_schaaf_2023
Authors: Lars Schaaf, Edvin Fako, Sandip De, Ansgar Schafer, Gabor Csanyi
Elements: C, H, In, O, Pt
Number of Configurations: 1,994
Number of Elements: 5
Number of Atoms: 163,746

Links:
https://doi.org/10.5281/zenodo.8268726
https://doi.org/10.48550/arXiv.2301.09931
COLL_test
Dataset Downloads Coming Soon Description: Test set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections

ColabFit ID: COLL_test__Gasteiger-Giri-Margraf-Günnemann__DS_anwiyxoek66t_0
Name: COLL_test
Authors: Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann
Elements: C, H, O
Number of Configurations: 9,480
Number of Elements: 3
Number of Atoms: 97,886

Links:
https://doi.org/10.6084/m9.figshare.13289165.v1
https://doi.org/10.48550/arXiv.2011.14115
COLL_train
Dataset Downloads Coming Soon Description: Training set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections

ColabFit ID: COLL_train__Gasteiger-Giri-Margraf-Günnemann__DS_cifhpgzw3ckj_0
Name: COLL_train
Authors: Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann
Elements: C, H, O
Number of Configurations: 120,000
Number of Elements: 3
Number of Atoms: 1,225,350

Links:
https://doi.org/10.6084/m9.figshare.13289165.v1
https://doi.org/10.48550/arXiv.2011.14115
COLL_validation
Dataset Downloads Coming Soon Description: Validation set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections

ColabFit ID: COLL_validation__Gasteiger-Giri-Margraf-Günnemann__DS_1y25o4zvyfm0_0
Name: COLL_validation
Authors: Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann
Elements: C, H, O
Number of Configurations: 10,000
Number of Elements: 3
Number of Atoms: 101,847

Links:
https://doi.org/10.6084/m9.figshare.13289165.v1
https://doi.org/10.48550/arXiv.2011.14115
COMP6v2-B973c-def2mTZVP
Dataset Downloads Coming Soon Description: COMP6v2-B973c-def2mTZVP is the portion of COMP6v2 calculated at the B973c/def2mTZVP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.

ColabFit ID: COMP6v2-B973c-def2mTZVP__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_0dbwoxq96gga_0
Name: COMP6v2-B973c-def2mTZVP
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 156,330
Number of Elements: 7
Number of Atoms: 3,786,071

Links:
https://doi.org/10.5281/zenodo.10126157
https://doi.org/10.1021/acs.jctc.0c00121
COMP6v2-wB97MD3BJ-def2TZVPP
Dataset Downloads Coming Soon Description: COMP6v2-wB97MD3BJ-def2TZVPP is the portion of COMP6v2 calculated at the wB97MD3BJ/def2TZVPP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.

ColabFit ID: COMP6v2-wB97MD3BJ-def2TZVPP__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_mznjdz4oqv11_0
Name: COMP6v2-wB97MD3BJ-def2TZVPP
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 156,353
Number of Elements: 7
Number of Atoms: 3,787,055

Links:
https://doi.org/10.5281/zenodo.10126157
https://doi.org/10.1021/acs.jctc.0c00121
COMP6v2-wB97MV-def2TZVPP
Dataset Downloads Coming Soon Description: COMP6v2-wB97MV-def2TZVPP is the portion of COMP6v2 calculated at the wB97MV/def2TZVPP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.

ColabFit ID: COMP6v2-wB97MV-def2TZVPP__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_m6etjqhlu40m_0
Name: COMP6v2-wB97MV-def2TZVPP
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 156,369
Number of Elements: 7
Number of Atoms: 3,787,406

Links:
https://doi.org/10.5281/zenodo.10126157
https://doi.org/10.1021/acs.jctc.0c00121
COMP6v2-wB97X-631Gd
Dataset Downloads Coming Soon Description: COMP6v2-wB97X-631Gd is the portion of COMP6v2 calculated at the wB97X/631Gd level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.

ColabFit ID: COMP6v2-wB97X-631Gd__Huddleston-Zubatyuk-Smith-Roitberg-Isayev-Pickering-Devereux-Barros__DS_15m7etw1om8a_0
Name: COMP6v2-wB97X-631Gd
Authors: Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
Elements: C, Cl, F, H, N, O, S
Number of Configurations: 157,728
Number of Elements: 7
Number of Atoms: 3,897,978

Links:
https://doi.org/10.5281/zenodo.10126157
https://doi.org/10.1021/acs.jctc.0c00121
C_Gardner_2022
Dataset Downloads Coming Soon Description: Approximately 115,000 configurations of carbon with 200 atoms, with simulated melt, quench, reheat, then annealing at the noted temperature. Includes a variety of carbon structures.

ColabFit ID: C_Gardner_2022__Gardner-Beaulieu-Deringer__DS_mma0w04wa4cs_0
Name: C_Gardner_2022
Authors: John L. A. Gardner, Zoé Faure Beaulieu, Volker L. Deringer
Elements: C
Number of Configurations: 115,206
Number of Elements: 1
Number of Atoms: 23,041,200

Links:
https://github.com/jla-gardner/carbon-data
https://doi.org/10.48550/arXiv.2211.16443
C_NPJ2020
Dataset Downloads Coming Soon Description: The dataset consists of energies and forces for monolayer graphene, bilayer graphene, graphite, and diamond in various states, including strained static structures and configurations drawn from ab initio MD trajectories. A total number of 4788 configurations was generated from DFT calculations using the Vienna Ab initio Simulation Package (VASP). The energies and forces are stored in the extended XYZ format. One file for each configuration.

ColabFit ID: C_NPJ2020__Wen-Tadmor__DS_qph0akhjv9kv_0
Name: C_NPJ2020
Authors: Mingjian Wen, Ellad B. Tadmor
Elements: C
Number of Configurations: 4,776
Number of Elements: 1
Number of Atoms: 228,852

Links:
https://doi.org/10.6084/m9.figshare.12649811.v1
https://doi.org/10.1038/s41524-020-00390-8
Carbon_GAP_JCP_2020
Dataset Downloads Coming Soon Description: GAP-20 describes the properties of the bulk crystalline and amorphous phases, crystal surfaces, and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional.

ColabFit ID: Carbon_GAP_JCP_2020__Rowe-Deringer-Gasparotto-Csányi-Michaelides__DS_on6fusdw5n8q_0
Name: Carbon_GAP_JCP_2020
Authors: Patrick Rowe, Volker L. Deringer, Piero Gasparotto, Gábor Csányi, Angelos Michaelides
Elements: C
Number of Configurations: 17,227
Number of Elements: 1
Number of Atoms: 1,312,457

Links:
https://www.repository.cam.ac.uk/handle/1810/307452
https://doi.org/10.1063/5.0005084
Carbon_GAP_JCP_2020_train
Dataset Downloads Coming Soon Description: Training data generated for GAP-20. GAP-20 describes the properties of the bulk crystalline and amorphous phases, crystal surfaces, and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional.

ColabFit ID: Carbon_GAP_JCP_2020_train__Rowe-Deringer-Gasparotto-Csányi-Michaelides__DS_sz9me9r61581_0
Name: Carbon_GAP_JCP_2020_train
Authors: Patrick Rowe, Volker L. Deringer, Piero Gasparotto, Gábor Csányi, Angelos Michaelides
Elements: C
Number of Configurations: 6,088
Number of Elements: 1
Number of Atoms: 400,275

Links:
https://www.repository.cam.ac.uk/handle/1810/307452
https://doi.org/10.1063/5.0005084
Carbon_allotrope_multilayer_graphene_graphite_PRB2019
Dataset Downloads Coming Soon Description: The dataset consists of energies and forces for pristine and defected monolayer graphene, bilayer graphene, and graphite in various states. The configurations in the dataset are generated in two ways: (1) crystals with distortions (compression and stretching of the simulation cell together with random perturbations of atoms), and (2) configura- tions drawn from ab initio molecular dynamics (AIMD) trajectories at 300, 900, and 1500 K. For monolayer graphene, the configurations include: * pristine - In-plane compressed and stretched monolayers - AIMD trajectories * defected - Configurations from the minimization of a monolayer with a single vacancy - AIMD trajectories of monolayers with a single vacancy For bilayer graphene, the configurations include: * pristine - AB-stacked bilayers with compression and stretching in the basal plane - Bilayers with different translational registry (e.g. AA, AB, and SP) at various layer separations - Twisted bilayers with different twisting angles at various layer separations - AIMD trajectories of twisted bilayers and bilayers in AB and AA stackings * defected - Configurations from the minimization of a bilayer with a single vacancy in each layer - AIMD trajectories of a bilayer with a single vacancy in one layer and the other layer pristine - AIMD trajectories of a bilayer with a single vacancy in each layer; Initial configuration without interlayer bonds - AIMD trajectories of a bilayer with a single vacancy in each layer; Initial configuration with interlayer bonds formed For graphite, the configurations include: * pristine - Graphite with compression and stretching in the basal plane - Graphite with compression and stretching along the c-axis - AIMD trajectories

ColabFit ID: Carbon_allotrope_multilayer_graphene_graphite_PRB2019__Wen-Tadmor__DS_zn8tf2kmgrdm_0
Name: Carbon_allotrope_multilayer_graphene_graphite_PRB2019
Authors: Mingjian Wen, Ellad B. Tadmor
Elements: C
Number of Configurations: 14,179
Number of Elements: 1
Number of Atoms: 656,204

Links:
https://journals.aps.org/prb/supplemental/10.1103/PhysRevB.100.195419/dataset.tar
https://doi.org/10.1103/PhysRevB.100.195419
Carolina_Materials
Dataset Downloads Coming Soon Description: Carolina Materials contains structures used to train several machine learning models for the efficient generation of hypothetical inorganic materials. The database is built using structures from OQMD, Materials Project and ICSD, as well as ML generated structures validated by DFT.

ColabFit ID: Carolina_Materials__Zhao-Al-Fahdi-Hu-Siriwardane-Song-Nasiri-Hu__DS_r2r8k3fyb6ny_0
Name: Carolina_Materials
Authors: Yong Zhao, Mohammed Al-Fahdi, Ming Hu, Edirisuriya M. D. Siriwardane, Yuqi Song, Alireza Nasiri, Jianjun Hu
Elements: Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Po, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 214,280
Number of Elements: 64
Number of Atoms: 3,168,502

Links:
https://zenodo.org/records/8381476
https://doi.org/10.1002/advs.202100566
Cationic_phenoxyimine_complexes_of_yttrium
Dataset Downloads Coming Soon Description: This dataset contains DFT calculations that were carried out in conjunction with experimental investigation of a cationic phenoxyimine yttrium complex as an isoprene polimerization catalyst. Calculations were performed using the Gaussian 09 D.01 suite of programs.Electronic structure calculations were performed at the DFT level using the B3PW91 functional. The Stuttgart-Cologne small-core quasi-relativistic pseudopotential ECP28MWB and its available basis set including up to the g function were used to describe yttrium. Similarly, silicon and phosphorus were represented by a Stuttgart-Dresden-Bonn pseudopotential along with the related basis set augmented by a d function of polarization (αd(P) = 0.387 and αd(Si) = 0.284). Other atoms were described by a polarized all-electron triple-ζ 6-311G(d,p) basis set. Bulk solvent effect of toluene or THF was simulated using the SMD continuum model. The Grimme empirical correction with the original D3 damping function was used to include the dispersion correction as a single-point calculation. Transition-state optimization was followed by frequency calculations to characterize the stationary point. Intrinsic reaction coordinate calculations were performed to confirm the connectivity of the transition states. Gibbs energies were estimated within the harmonic oscillator approximation and estimated at 298 K and 1 atm.

ColabFit ID: Cationic_phenoxyimine_complexes_of_yttrium__Oswald-Verrieux-Breuil-Olivier-Bourbigou-Thuilliez-Vaultier-Taoufik-Perrin-Boisson__DS_y6bp7td4dle0_0
Name: Cationic_phenoxyimine_complexes_of_yttrium
Authors: Alexis D. Oswald, Ludmilla Verrieux, Pierre-Alain R. Breuil, Hélène Olivier-Bourbigou, Julien Thuilliez, Florent Vaultier, Mostafa Taoufik, Lionel Perrin, Christophe Boisson
Elements: Al, B, C, F, H, N, O, Si, Y
Number of Configurations: 109
Number of Elements: 9
Number of Atoms: 9,074

Links:
https://doi.org/10.1021/acs.organomet.2c00238.s001
https://doi.org/10.1021/acs.organomet.2c00238
Chig-AIMD_random_test
Dataset Downloads Coming Soon Description: Test configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.

ColabFit ID: Chig-AIMD_random_test__Wang-He-Li-Shao-Liu__DS_e08dhew7z0r6_0
Name: Chig-AIMD_random_test
Authors: Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Elements: C, H, N, O
Number of Configurations: 199,000
Number of Elements: 4
Number of Atoms: 33,034,000

Links:
https://doi.org/10.1038/s41597-023-02465-9
https://doi.org/10.6084/m9.figshare.22786730.v4
Chig-AIMD_random_train
Dataset Downloads Coming Soon Description: Training configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.

ColabFit ID: Chig-AIMD_random_train__Wang-He-Li-Shao-Liu__DS_dsikv10na4f8_0
Name: Chig-AIMD_random_train
Authors: Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Elements: C, H, N, O
Number of Configurations: 1,592,800
Number of Elements: 4
Number of Atoms: 264,404,800

Links:
https://doi.org/10.1038/s41597-023-02465-9
https://doi.org/10.6084/m9.figshare.22786730.v4
Chig-AIMD_random_val
Dataset Downloads Coming Soon Description: Validation configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.

ColabFit ID: Chig-AIMD_random_val__Wang-He-Li-Shao-Liu__DS_u0ggh1iie0uc_0
Name: Chig-AIMD_random_val
Authors: Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Elements: C, H, N, O
Number of Configurations: 199,000
Number of Elements: 4
Number of Atoms: 33,034,000

Links:
https://doi.org/10.1038/s41597-023-02465-9
https://doi.org/10.6084/m9.figshare.22786730.v4
Chig-AIMD_scaffold_test
Dataset Downloads Coming Soon Description: Test configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.

ColabFit ID: Chig-AIMD_scaffold_test__Wang-He-Li-Shao-Liu__DS_zmzkyvep8l55_0
Name: Chig-AIMD_scaffold_test
Authors: Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Elements: C, H, N, O
Number of Configurations: 199,000
Number of Elements: 4
Number of Atoms: 33,034,000

Links:
https://doi.org/10.1038/s41597-023-02465-9
https://doi.org/10.6084/m9.figshare.22786730.v4
Chig-AIMD_scaffold_train
Dataset Downloads Coming Soon Description: Training configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.

ColabFit ID: Chig-AIMD_scaffold_train__Wang-He-Li-Shao-Liu__DS_7puixss6qd61_0
Name: Chig-AIMD_scaffold_train
Authors: Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Elements: C, H, N, O
Number of Configurations: 1,592,800
Number of Elements: 4
Number of Atoms: 264,404,800

Links:
https://doi.org/10.1038/s41597-023-02465-9
https://doi.org/10.6084/m9.figshare.22786730.v4
Chig-AIMD_scaffold_val
Dataset Downloads Coming Soon Description: Validation configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.

ColabFit ID: Chig-AIMD_scaffold_val__Wang-He-Li-Shao-Liu__DS_mzz13lim5qfi_0
Name: Chig-AIMD_scaffold_val
Authors: Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Elements: C, H, N, O
Number of Configurations: 199,000
Number of Elements: 4
Number of Atoms: 33,034,000

Links:
https://doi.org/10.1038/s41597-023-02465-9
https://doi.org/10.6084/m9.figshare.22786730.v4
Co-Co_coupling_at_liquid_water-Cu(100)_interfaces_JC2021
Dataset Downloads Coming Soon Description: This dataset contains data from eight AIMD simulations run in VASP to study electrochemical *CO-*CO coupling -- coupling of two *CO molecules -- at the liquid water-Cu(100) interface.

ColabFit ID: Co-Co_coupling_at_liquid_water-Cu(100)_interfaces_JC2021__Kristoffersen-Chan__DS_3a3auhtmkv45_0
Name: Co-Co_coupling_at_liquid_water-Cu(100)_interfaces_JC2021
Authors: Henrik H. Kristoffersen, Karen Chan
Elements: C, Cs, Cu, H, Li, O
Number of Configurations: 1,671,203
Number of Elements: 6
Number of Atoms: 226,264,514

Links:
https://doi.org/10.24435/materialscloud:p9-q7
https://doi.org/10.1016/j.jcat.2021.02.023
CoCrFeNiPd_MRL2020
Dataset Downloads Coming Soon Description: The dataset for "Origin of high strength in the CoCrFeNiPd high-entropy alloy", containing DFT-calculated values of the high-entropy alloy CoCrFeNiPd, created to explore the reasons behind experimental findings of the increased strength CoCrFeNiPd in comparison to CoCrFeNi.

ColabFit ID: CoCrFeNiPd_MRL2020__Yin-Curtin__DS_4w9gf8dxzskh_0
Name: CoCrFeNiPd_MRL2020
Authors: Binglun Yin, W. A. Curtin
Elements: Co, Cr, Fe, Ni, Pd
Number of Configurations: 116
Number of Elements: 5
Number of Atoms: 8,552

Links:
https://doi.org/10.24435/materialscloud:2020.0045/v1
https://doi.org/10.24435/materialscloud:2020.0045/v1
CoNbV_CMS2019
Dataset Downloads Coming Soon Description: This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included about 27,000 configurations that were bcc-like and close-packed (fcc, hcp, etc.) with 8 or fewer atoms in the unit cell and different concentrations of Co, Nb, and V.

ColabFit ID: CoNbV_CMS2019__Gubaev-Podryabinkin-Hart-Shapeev__DS_sn623uhg2d1b_0
Name: CoNbV_CMS2019
Authors: Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev
Elements: Co, Nb, V
Number of Configurations: 383
Number of Elements: 3
Number of Atoms: 2,812

Links:
https://gitlab.com/kgubaev/accelerating-high-throughput-searches-for-new-alloys-with-active-learning-data
https://doi.org/10.1016/j.commatsci.2018.09.031
Co_dimer_JPCA_2022
Dataset Downloads Coming Soon Description: This dataset contains dimer molecules of Co(II) with potential energy calculations for structures with ferromagnetic and antiferromagnetic spin configurations. Calculations were carried out in Gaussian 16 with the PBE exchange-correlation functional and 6-31+G* basis set. All molecules contain the same atomic core region, consisting of the tetrahedral and octahedral Co centers and the three PO2R2 bridging ligands. The ligand exchange provides a broad range of exchange energies (ΔEJ), from +50 to -200 meV, with 80% of the ligands yielding ΔEJ < 10 meV.

ColabFit ID: Co_dimer_JPCA_2022__Ren-Fonseca-Perry-Cheng-Zhang-Hennig__DS_uwqpfuuj6utw_0
Name: Co_dimer_JPCA_2022
Authors: Sijin Ren, Eric Fonseca, William Perry, Hai-Ping Cheng, Xiao-Guang Zhang, Richard Hennig
Elements: C, Cl, Co, H, N, O, P, S
Number of Configurations: 2,162
Number of Elements: 8
Number of Atoms: 188,438

Links:
https://doi.org/10.24435/materialscloud:pe-zv
https://doi.org/10.1021/acs.jpca.1c08950
Co_dimer_JPCA_2022_train
Dataset Downloads Coming Soon Description: Training data only from the Co_dimer_JPCA_2022 dataset. This dataset contains dimer molecules of Co(II) with potential energy calculations for structures with ferromagnetic and antiferromagnetic spin configurations. Calculations were carried out in Gaussian 16 with the PBE exchange-correlation functional and 6-31+G* basis set. All molecules contain the same atomic core region, consisting of the tetrahedral and octahedral Co centers and the three PO2R2 bridging ligands. The ligand exchange provides a broad range of exchange energies (ΔEJ), from +50 to -200 meV, with 80% of the ligands yielding ΔEJ < 10 meV.

ColabFit ID: Co_dimer_JPCA_2022_train__Ren-Fonseca-Perry-Cheng-Zhang-Hennig__DS_doo9ltj3vsnu_0
Name: Co_dimer_JPCA_2022_train
Authors: Sijin Ren, Eric Fonseca, William Perry, Hai-Ping Cheng, Xiao-Guang Zhang, Richard Hennig
Elements: C, Cl, Co, H, N, O, P, S
Number of Configurations: 1,798
Number of Elements: 8
Number of Atoms: 154,882

Links:
https://doi.org/10.24435/materialscloud:pe-zv
https://doi.org/10.1021/acs.jpca.1c08950
ComBat
Dataset Downloads Coming Soon Description: DFT-optimized geometries and properties for Li-S electrolytes. These make up the Computational Database for Li-S Batteries (ComBat), calculated using Gaussian 16 at the B3LYP/6-31+G* level of theory.

ColabFit ID: ComBat__Atwi-Bliss-Makeev-Rajput__DS_xs942mj0c3dx_0
Name: ComBat
Authors: Rasha Atwi, Matthew Bliss, Maxim Makeev, Nav Nidhi Rajput
Elements: C, F, H, Li, N, O, P, S, Si
Number of Configurations: 230
Number of Elements: 9
Number of Atoms: 5,662

Links:
https://github.com/rashatwi/combat/
https://doi.org/10.1038/s41598-022-20009-w
CrCoNi_Cao_2022
Dataset Downloads Coming Soon Description: Training dataset that captures chemical short-range order in equiatomic CrCoNi medium-entropy alloy published with our work Quantifying chemical short-range order in metallic alloys (description provided by authors)

ColabFit ID: CrCoNi_Cao_2022__Cao-Sheriff-Freitas__DS_z2mkok0egrm8_0
Name: CrCoNi_Cao_2022
Authors: Yifan Cao, Killian Sheriff, Rodrigo Freitas
Elements: Co, Cr, Ni
Number of Configurations: 1,257
Number of Elements: 3
Number of Atoms: 108,684

Links:
https://github.com/yifan-henry-cao/MachineLearningPotential/blob/main/Training_datasets/Training_Cao_20220823.cfg
https://arxiv.org/abs/2311.01545
CuPd_CMS2019
Dataset Downloads Coming Soon Description: This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included 40,000 unrelaxed configurations with BCC, FCC, and HCP lattices.

ColabFit ID: CuPd_CMS2019__Gubaev-Podryabinkin-Hart-Shapeev__DS_0ry6z1j8mi8c_0
Name: CuPd_CMS2019
Authors: Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev
Elements: Cu, Pd
Number of Configurations: 522
Number of Elements: 2
Number of Atoms: 2,450

Links:
https://gitlab.com/kgubaev/accelerating-high-throughput-searches-for-new-alloys-with-active-learning-data
https://doi.org/10.1016/j.commatsci.2018.09.031
Cu_FHI-aims_NPJCM_2021
Dataset Downloads Coming Soon Description: Approximately 46,000 configurations of copper, including small and bulk structures, surfaces, interfaces, point defects, and randomly modified variants. Also includes structures with displaced or missing atoms.

ColabFit ID: Cu_FHI-aims_NPJCM_2021__Lysogorskiy-Oord-Bochkarev-Menon-Rinaldi-Hammerschmidt-Mrovec-Thompson-Csányi-Ortner-Drautz__DS_39v8vn61tlzl_0
Name: Cu_FHI-aims_NPJCM_2021
Authors: Yury Lysogorskiy, Cas van der Oord, Anton Bochkarev, Sarath Menon, Matteo Rinaldi, Thomas Hammerschmidt, Matous Mrovec, Aidan Thompson, Gábor Csányi, Christoph Ortner, Ralf Drautz
Elements: Cu
Number of Configurations: 46,583
Number of Elements: 1
Number of Atoms: 308,679

Links:
https://doi.org/10.5281/zenodo.4734035
https://doi.org/10.1038/s41524-021-00559-9
DAS_MLIP_CoSb_MgSb
Dataset Downloads Coming Soon Description: Approximately 850 configurations of CoSb3 and Mg3Sb2 generated using a dual adaptive sampling (DAS) method for use with machine learning of interatomic potentials (MLIP).

ColabFit ID: DAS_MLIP_CoSb_MgSb__Yang-Zhu-Dong-Wu-Yang-Zhang__DS_eyo7scxvrfq0_0
Name: DAS_MLIP_CoSb_MgSb
Authors: Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, Wenqing Zhang
Elements: Mg, Sb
Number of Configurations: 846
Number of Elements: 2
Number of Atoms: 247,744

Links:
https://doi.org/10.1103/PhysRevB.104.094310
https://doi.org/10.1103/PhysRevB.104.094310
DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_test
Dataset Downloads Coming Soon Description: Benzene test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_test__Kapil-Engel__DS_ech19sh1koj4_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_test
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H
Number of Configurations: 200
Number of Elements: 2
Number of Atoms: 5,760

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_train
Dataset Downloads Coming Soon Description: Benzene training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_train__Kapil-Engel__DS_ce4zytu7ph7e_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_train
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H
Number of Configurations: 1,800
Number of Elements: 2
Number of Atoms: 49,536

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_validation
Dataset Downloads Coming Soon Description: Benzene validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_validation__Kapil-Engel__DS_xaq6jwha0a7o_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_benzene_validation
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H
Number of Configurations: 200
Number of Elements: 2
Number of Atoms: 6,072

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_test
Dataset Downloads Coming Soon Description: Glycine test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_test__Kapil-Engel__DS_h3jymhr74qv4_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_test
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, N, O
Number of Configurations: 200
Number of Elements: 4
Number of Atoms: 6,880

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_train
Dataset Downloads Coming Soon Description: Glycine training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_train__Kapil-Engel__DS_b1n228x610dj_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_train
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, N, O
Number of Configurations: 3,594
Number of Elements: 4
Number of Atoms: 109,940

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_validation
Dataset Downloads Coming Soon Description: Glycine validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_validation__Kapil-Engel__DS_n1nilf7ralhq_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_glycine_validation
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, N, O
Number of Configurations: 200
Number of Elements: 4
Number of Atoms: 7,120

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_test
Dataset Downloads Coming Soon Description: Succinic acid test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_test__Kapil-Engel__DS_vbjvssevrff5_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_test
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, O
Number of Configurations: 200
Number of Elements: 3
Number of Atoms: 5,600

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_train
Dataset Downloads Coming Soon Description: Succinic acid training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_train__Kapil-Engel__DS_pqmtyp5bxd46_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_train
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, O
Number of Configurations: 1,800
Number of Elements: 3
Number of Atoms: 50,400

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_validation
Dataset Downloads Coming Soon Description: Succinic acid validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_validation__Kapil-Engel__DS_w18131j6r4oo_0
Name: DFT_polymorphs_PNAS_2022_PBE0_MBD_succinic_acid_validation
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, O
Number of Configurations: 200
Number of Elements: 3
Number of Atoms: 5,600

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_benzene_test
Dataset Downloads Coming Soon Description: Benzene test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_benzene_test__Kapil-Engel__DS_23820czfaq24_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_benzene_test
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H
Number of Configurations: 1,000
Number of Elements: 2
Number of Atoms: 29,736

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_benzene_train
Dataset Downloads Coming Soon Description: Benzene training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_benzene_train__Kapil-Engel__DS_3qi25f3sxkwr_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_benzene_train
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H
Number of Configurations: 55,000
Number of Elements: 2
Number of Atoms: 1,602,048

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_benzene_validation
Dataset Downloads Coming Soon Description: Benzene validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_benzene_validation__Kapil-Engel__DS_8ou78o18mvfz_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_benzene_validation
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H
Number of Configurations: 1,000
Number of Elements: 2
Number of Atoms: 29,712

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_glycine_test
Dataset Downloads Coming Soon Description: Glycine test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_glycine_test__Kapil-Engel__DS_qx4h8j4luk70_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_glycine_test
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, N, O
Number of Configurations: 500
Number of Elements: 4
Number of Atoms: 17,710

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_glycine_train
Dataset Downloads Coming Soon Description: Glycine training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_glycine_train__Kapil-Engel__DS_op9kvcm7ui6l_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_glycine_train
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, N, O
Number of Configurations: 29,070
Number of Elements: 4
Number of Atoms: 952,650

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_glycine_validation
Dataset Downloads Coming Soon Description: Glycine validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_glycine_validation__Kapil-Engel__DS_052bb0zufjaz_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_glycine_validation
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, N, O
Number of Configurations: 500
Number of Elements: 4
Number of Atoms: 17,800

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_test
Dataset Downloads Coming Soon Description: Succinic acid test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_test__Kapil-Engel__DS_qmsg3kejmsy2_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_test
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, O
Number of Configurations: 500
Number of Elements: 3
Number of Atoms: 14,000

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_train
Dataset Downloads Coming Soon Description: Succinic acid training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_train__Kapil-Engel__DS_0u0k9ghrlkpx_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_train
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, O
Number of Configurations: 29,212
Number of Elements: 3
Number of Atoms: 817,936

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_validation
Dataset Downloads Coming Soon Description: Succinic acid validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.

ColabFit ID: DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_validation__Kapil-Engel__DS_nny0vpj1dy9s_0
Name: DFT_polymorphs_PNAS_2022_PBE_TS_succinic_acid_validation
Authors: Venkat Kapil, Edgar A. Engel
Elements: C, H, O
Number of Configurations: 500
Number of Elements: 3
Number of Atoms: 14,000

Links:
https://doi.org/10.24435/materialscloud:vp-jf
https://doi.org/10.1073/pnas.2111769119
DP-GEN_Cu
Dataset Downloads Coming Soon Description: Approximately 15,000 configurations of copper used to demonstrate the DP-GEN data generator for PES machine learning models.

ColabFit ID: DP-GEN_Cu__Zhang-Wang-Chen-Zeng-Zhang-Wang-E__DS_101uk70asqhq_0
Name: DP-GEN_Cu
Authors: Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, Weinan E
Elements: Cu
Number of Configurations: 15,286
Number of Elements: 1
Number of Atoms: 297,597

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=Cu-dpgen
https://doi.org/10.1016/j.cpc.2020.107206
DeePMD_SE
Dataset Downloads Coming Soon Description: 127,000 configurations from a dataset used to benchmark and train a modified DeePMD model called DeepPot-SE, or Deep Potential - Smooth Edition

ColabFit ID: DeePMD_SE__Zhang-Han-Wang-Saidi-Car-E__DS_k85kj1kiekip_0
Name: DeePMD_SE
Authors: Linfeng Zhang, Jiequn Han, Han Wang, Wissam A. Saidi, Roberto Car, Weinan E
Elements: Al, C, Co, Cr, Cu, Fe, Ge, H, Mn, Mo, N, Ni, O, Pt, S, Si, Ti
Number of Configurations: 127,112
Number of Elements: 17
Number of Atoms: 26,278,380

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=deepmd-se-dataset
https://doi.org/10.48550/arXiv.1805.09003
Fe_nanoparticles_PRB_2023
Dataset Downloads Coming Soon Description: This iron nanoparticles database contains dimers; trimers; bcc, fcc, hexagonal close-packed (hcp), simple cubic, and diamond crystalline structures. A wide range of cell parameters, as well as rattled structures, bcc-fcc and bcc-hcp transitional structures, surface slabs cleaved from relaxed bulk structures, nanoparticles and liquid configurations are included. The energy, forces and virials for the atomic structures were computed at the DFT level of theory using VASP with the PBE functional and standard PAW pseudopotentials for Fe (with 8 valence electrons, 4s^23d^6). The kinetic energy cutoff for plane waves was set to 400 eV and the energy threshold for convergence was 10-7 eV. All the DFT calculations were carried out with spin polarization.

ColabFit ID: Fe_nanoparticles_PRB_2023__Jana-Caro__DS_5h3810yhu4wj_0
Name: Fe_nanoparticles_PRB_2023
Authors: Richard Jana, Miguel A. Caro
Elements: Fe
Number of Configurations: 198
Number of Elements: 1
Number of Atoms: 20,097

Links:
https://doi.org/10.5281/zenodo.7632315
https://doi.org/10.1103/PhysRevB.107.245421
FitSNAP_Fe_NPJ_2021
Dataset Downloads Coming Soon Description: About 2,500 configurations of alpha-Fe used in the training and testing of a ML model with the goal of building magneto-elastic machine-learning interatomic potentials for large-scale spin-lattice dynamics simulations.

ColabFit ID: FitSNAP_Fe_NPJ_2021__Nikolov-Wood-Cangi-Maillet-Marinica-Thompson-Desjarlais-Tranchida__DS_ej5u0tuycoph_0
Name: FitSNAP_Fe_NPJ_2021
Authors: Svetoslav Nikolov, Mitchell A. Wood, Attila Cangi, Jean-Bernard Maillet, Mihai-Cosmin Marinica, Aidan P. Thompson, Michael P. Desjarlais, Julien Tranchida
Elements: Fe
Number of Configurations: 2,517
Number of Elements: 1
Number of Atoms: 61,526

Links:
https://github.com/FitSNAP
https://doi.org/10.1038/s41524-021-00617-2
Forces_are_not_enough
Dataset Downloads Coming Soon Description: Approximately 300,000 benchmarking configurations derived partly from the MD-17 and LiPS datasets, partly from original simulated water and alanine dipeptide configurations.

ColabFit ID: Forces_are_not_enough__Fu-Wu-Wang-Xie-Keten-Gomez-Bombarelli-Jaakkola__DS_tku3ae1rtxiy_0
Name: Forces_are_not_enough
Authors: Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez-Bombarelli, Tommi Jaakkola
Elements: C, H, Li, N, O, P, S
Number of Configurations: 295,001
Number of Elements: 7
Number of Atoms: 23,735,083

Links:
https://doi.org/10.5281/zenodo.7196767
https://doi.org/10.48550/arXiv.2210.07237
GDB_9_nature_2014
Dataset Downloads Coming Soon Description: 133,855 configurations of stable small organic molecules composed of CHONF. A subset of GDB-17, with calculations of energies, dipole moment, polarizability and enthalpy. Calculations performed at B3LYP/6-31G(2df,p) level of theory.

ColabFit ID: GDB_9_nature_2014__Ramakrishnan-Dral-Rupp-Lilienfeld__DS_2s5grfmdez1q_0
Name: GDB_9_nature_2014
Authors: Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Elements: C, F, H, N, O
Number of Configurations: 133,885
Number of Elements: 5
Number of Atoms: 2,407,753

Links:
https://doi.org/10.6084/m9.figshare.c.978904.v5
https://doi.org/10.1038/sdata.2014.22
GFN-xTB_JCIM_2021
Dataset Downloads Coming Soon Description: 10,000 configurations of organosilicon compounds with energies predicted by an improved GFN-xTB Hamiltonian parameterization, using revPBE.

ColabFit ID: GFN-xTB_JCIM_2021__Komissarov-Verstraelen__DS_0x8ozlt9g3y5_0
Name: GFN-xTB_JCIM_2021
Authors: Leonid Komissarov, Toon Verstraelen
Elements: Br, C, Cl, F, H, N, O, P, S, Si
Number of Configurations: 157,367
Number of Elements: 10
Number of Atoms: 4,022,149

Links:
https://doi.org/10.24435/materialscloud:14-4m
https://doi.org/10.1021/acs.jcim.1c01170
GST_GAP_22_extended
Dataset Downloads Coming Soon Description: The extended training dataset for GST_GAP_22, calculated using the PBEsol functional. New configurations, simulated under external electric fields, were labelled with DFT and added to the original reference database GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions.

ColabFit ID: GST_GAP_22_extended__Zhou-Zhang-Ma-Deringer__DS_efcq7a0h6z5w_0
Name: GST_GAP_22_extended
Authors: Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer
Elements: Ge, Sb, Te
Number of Configurations: 2,916
Number of Elements: 3
Number of Atoms: 399,247

Links:
https://doi.org/10.5281/zenodo.8208202
https://doi.org/10.1038/s41928-023-01030-x
GST_GAP_22_main
Dataset Downloads Coming Soon Description: The main training dataset for GST_GAP_22, calculated using the PBEsol functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions.

ColabFit ID: GST_GAP_22_main__Zhou-Zhang-Ma-Deringer__DS_r3hav37ufnmb_0
Name: GST_GAP_22_main
Authors: Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer
Elements: Ge, Sb, Te
Number of Configurations: 2,692
Number of Elements: 3
Number of Atoms: 341,068

Links:
https://doi.org/10.5281/zenodo.8208202
https://doi.org/10.1038/s41928-023-01030-x
GST_GAP_22_refitted
Dataset Downloads Coming Soon Description: The training dataset for GST_GAP_22, recalculated using the PBE functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions.

ColabFit ID: GST_GAP_22_refitted__Zhou-Zhang-Ma-Deringer__DS_jy3ylaf48xg3_0
Name: GST_GAP_22_refitted
Authors: Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer
Elements: Ge, Sb, Te
Number of Configurations: 2,692
Number of Elements: 3
Number of Atoms: 341,004

Links:
https://doi.org/10.5281/zenodo.8208202
https://doi.org/10.1038/s41928-023-01030-x
HDNNP_H2O
Dataset Downloads Coming Soon Description: Approximately 28,000 configurations split into 4 datasets, each using a different functional, used in the training of a high-dimensional neural network potential (HDNNP).

ColabFit ID: HDNNP_H2O__Morawietz-Behler__DS_in2572v33lon_0
Name: HDNNP_H2O
Authors: Tobias Morawietz, Jörg Behler
Elements: H, O
Number of Configurations: 28,678
Number of Elements: 2
Number of Atoms: 2,327,628

Links:
https://doi.org/10.5281/zenodo.2634097
https://doi.org/10.1073/pnas.1602375113
HEA25S_high_entropy_alloys
Dataset Downloads Coming Soon Description: Dataset from "Surface segregation in high-entropy alloys from alchemical machine learning: dataset HEA25S". Includes 10000 bulk HEA structures (Dataset O), 2640 HEA surface slabs (Dataset A), together with 1000 bulk and 1000 surface slabs snapshots from the molecular dynamics (MD) runs (Datasets B and C), and 500 MD snapshots of the 25 elements Cantor-style alloy surface slabs. These splits, along with their respective train, test, and validation splits, are included as configuration sets.

ColabFit ID: HEA25S_high_entropy_alloys__Mazitov-Springer-Lopanitsyna-Fraux-De-Ceriotti__DS_d5zht3ykr8hi_0
Name: HEA25S_high_entropy_alloys
Authors: Arslan Mazitov, Maximilian A. Springer, Nataliya Lopanitsyna, Guillaume Fraux, Sandip De, Michele Ceriotti
Elements: Ag, Au, Co, Cr, Cu, Fe, Hf, Ir, Lu, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, Sc, Ta, Ti, V, W, Y, Zn, Zr
Number of Configurations: 15,004
Number of Elements: 25
Number of Atoms: 633,387

Links:
https://doi.org/10.24435/materialscloud:ps-20
http://doi.org/10.48550/arXiv.2310.07604
HEA25_high_entropy_transition-metal_alloys
Dataset Downloads Coming Soon Description: Dataset from "Modeling high-entropy transition-metal alloys with alchemical compression". Includes 25,000 structures utilized for fitting the aforementioned potential, with a focus on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. Each configuration includes a "class" field, indicating the crystal class of the structure. The class represents the following: 1: perfect crystals; 3-8 elements per structure, 2: shuffled positions (standard deviation 0.2\AA ); 3-8 elements per structure, 3: shuffled positions (standard deviation 0.5\AA ); 3-8 elements per structure, 4: shuffled positions (standard deviation 0.2\AA ); 3-25 elements per structure. Configuration sets include divisions into fcc and bcc crystals, further split by class as described above.

ColabFit ID: HEA25_high_entropy_transition-metal_alloys__Lopanitsyna-Fraux-Springer-De-Ceriotti__DS_hvx1spm3gmia_0
Name: HEA25_high_entropy_transition-metal_alloys
Authors: Nataliya Lopanitsyna, Guillaume Fraux, Maximilian A. Springer, Sandip De, Michele Ceriotti
Elements: Ag, Au, Co, Cr, Cu, Fe, Hf, Ir, Lu, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, Sc, Ta, Ti, V, W, Y, Zn, Zr
Number of Configurations: 25,627
Number of Elements: 25
Number of Atoms: 1,063,680

Links:
https://doi.org/10.24435/materialscloud:73-yn
http://doi.org/10.48550/arXiv.2212.13254
HME21_test
Dataset Downloads Coming Soon Description: The test set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density.

ColabFit ID: HME21_test__Takamoto-Shinagawa-Motoki-Nakago-Li-Kurata-Watanabe-Yayama-Iriguchi-Asano-Onodera-Ishii-Kudo-Ono-Sawada-Ishitani-Ong-Yamaguchi-Kataoka-Hayashi-Charoenphakdee-Ibuka__DS_cpgyq72fs7uk_0
Name: HME21_test
Authors: So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka
Elements: Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn
Number of Configurations: 2,495
Number of Elements: 37
Number of Atoms: 69,572

Links:
https://doi.org/10.6084/m9.figshare.19658538.v2
https://doi.org/10.1038/s41467-022-30687-9
HME21_train
Dataset Downloads Coming Soon Description: The training set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density.

ColabFit ID: HME21_train__Takamoto-Shinagawa-Motoki-Nakago-Li-Kurata-Watanabe-Yayama-Iriguchi-Asano-Onodera-Ishii-Kudo-Ono-Sawada-Ishitani-Ong-Yamaguchi-Kataoka-Hayashi-Charoenphakdee-Ibuka__DS_jhfis7syauhm_0
Name: HME21_train
Authors: So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka
Elements: Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn
Number of Configurations: 19,956
Number of Elements: 37
Number of Atoms: 555,050

Links:
https://doi.org/10.6084/m9.figshare.19658538.v2
https://doi.org/10.1038/s41467-022-30687-9
HME21_validation
Dataset Downloads Coming Soon Description: The validation set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density.

ColabFit ID: HME21_validation__Takamoto-Shinagawa-Motoki-Nakago-Li-Kurata-Watanabe-Yayama-Iriguchi-Asano-Onodera-Ishii-Kudo-Ono-Sawada-Ishitani-Ong-Yamaguchi-Kataoka-Hayashi-Charoenphakdee-Ibuka__DS_vf8hkwxhibio_0
Name: HME21_validation
Authors: So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka
Elements: Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn
Number of Configurations: 2,498
Number of Elements: 37
Number of Atoms: 69,420

Links:
https://doi.org/10.6084/m9.figshare.19658538.v2
https://doi.org/10.1038/s41467-022-30687-9
HO_LiMoNiTi_NPJCM_2020_LiMoNiTi_train
Dataset Downloads Coming Soon Description: Training configurations of Li8Mo2Ni7Ti7O32 from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.

ColabFit ID: HO_LiMoNiTi_NPJCM_2020_LiMoNiTi_train__Cooper-Kästner-Urban-Artrith__DS_1i4q8uyams32_0
Name: HO_LiMoNiTi_NPJCM_2020_LiMoNiTi_train
Authors: April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
Elements: Li, Mo, Ni, O, Ti
Number of Configurations: 824
Number of Elements: 5
Number of Atoms: 46,144

Links:
https://doi.org/10.24435/materialscloud:2020.0037/v1
https://doi.org/10.1038/s41524-020-0323-8
HO_LiMoNiTi_NPJCM_2020_LiMoNiTi_validation
Dataset Downloads Coming Soon Description: Validation configurations of Li8Mo2Ni7Ti7O32 from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.

ColabFit ID: HO_LiMoNiTi_NPJCM_2020_LiMoNiTi_validation__Cooper-Kästner-Urban-Artrith__DS_fgjil336jos6_0
Name: HO_LiMoNiTi_NPJCM_2020_LiMoNiTi_validation
Authors: April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
Elements: Li, Mo, Ni, O, Ti
Number of Configurations: 1,792
Number of Elements: 5
Number of Atoms: 100,352

Links:
https://doi.org/10.24435/materialscloud:2020.0037/v1
https://doi.org/10.1038/s41524-020-0323-8
HO_LiMoNiTi_NPJCM_2020_bulk_water_train_test
Dataset Downloads Coming Soon Description: Training and testing configurations of bulk water from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.

ColabFit ID: HO_LiMoNiTi_NPJCM_2020_bulk_water_train_test__Cooper-Kästner-Urban-Artrith__DS_jjiywlyfpnme_0
Name: HO_LiMoNiTi_NPJCM_2020_bulk_water_train_test
Authors: April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
Elements: H, O
Number of Configurations: 700
Number of Elements: 2
Number of Atoms: 134,400

Links:
https://doi.org/10.24435/materialscloud:2020.0037/v1
https://doi.org/10.1038/s41524-020-0323-8
HO_LiMoNiTi_NPJCM_2020_bulk_water_validation
Dataset Downloads Coming Soon Description: Validation configurations of bulk water from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.

ColabFit ID: HO_LiMoNiTi_NPJCM_2020_bulk_water_validation__Cooper-Kästner-Urban-Artrith__DS_w3whk1nr1bld_0
Name: HO_LiMoNiTi_NPJCM_2020_bulk_water_validation
Authors: April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
Elements: H, O
Number of Configurations: 2,112
Number of Elements: 2
Number of Atoms: 405,504

Links:
https://doi.org/10.24435/materialscloud:2020.0037/v1
https://doi.org/10.1038/s41524-020-0323-8
HO_LiMoNiTi_NPJCM_2020_water_clusters
Dataset Downloads Coming Soon Description: Configurations of water clusters from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.

ColabFit ID: HO_LiMoNiTi_NPJCM_2020_water_clusters__Cooper-Kästner-Urban-Artrith__DS_or3nu4t64mvk_0
Name: HO_LiMoNiTi_NPJCM_2020_water_clusters
Authors: April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
Elements: H, O
Number of Configurations: 1,848
Number of Elements: 2
Number of Atoms: 33,264

Links:
https://doi.org/10.24435/materialscloud:2020.0037/v1
https://doi.org/10.1038/s41524-020-0323-8
HO_PNAS_2019
Dataset Downloads Coming Soon Description: 1590 configurations of H2O/water with total energy and forces calculated using a hybrid approach at DFT/revPBE0-D3 level of theory.

ColabFit ID: HO_PNAS_2019__Cheng-Engel-Behler-Dellago-Ceriotti__DS_o73heom14iew_0
Name: HO_PNAS_2019
Authors: Bingqing Cheng, Edgar A. Engel, Jörg Behler, Christoph Dellago, Michele Ceriotti
Elements: H, O
Number of Configurations: 1,588
Number of Elements: 2
Number of Atoms: 304,896

Links:
https://archive.materialscloud.org/record/2018.0020/v1
https://doi.org/10.1073/pnas.1815117116
HPt_NC_2022
Dataset Downloads Coming Soon Description: A training dataset of 90,000 configurations with interaction properties between H2 and Pt(111) surfaces.

ColabFit ID: HPt_NC_2022__Vandermause-Xie-Lim-Owen-Kozinsky__DS_fhvr3y1alb97_0
Name: HPt_NC_2022
Authors: Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky
Elements: H, Pt
Number of Configurations: 90,740
Number of Elements: 2
Number of Atoms: 5,706,023

Links:
https://doi.org/10.24435/materialscloud:r0-84
https://doi.org/10.1038/s41467-022-32294-0
H_nature_2022
Dataset Downloads Coming Soon Description: Over 300,000 configurations in an expanded dataset of 19 hydrogen combustion reaction channels. Intrinsic reaction coordinate calculations (IRC) are combined with ab initio simulations (AIMD) and normal mode displacement (NM) calculations.

ColabFit ID: H_nature_2022__Guan-Das-Stein-Heidar-Zadeh-Bertels-Liu-Haghighatlari-Li-Zhang-Hao-Leven-Head-Gordon-Head-Gordon__DS_lvixye0ynk1o_0
Name: H_nature_2022
Authors: Xingyi Guan, Akshaya Das, Christopher J. Stein, Farnaz Heidar-Zadeh, Luke Bertels, Meili Liu, Mojtaba Haghighatlari, Jie Li, Oufan Zhang, Hongxia Hao, Itai Leven, Martin Head-Gordon, Teresa Head-Gordon
Elements: H, O
Number of Configurations: 350,121
Number of Elements: 2
Number of Atoms: 1,513,654

Links:
https://doi.org/10.6084/m9.figshare.19601689.v3
https://doi.org/10.1038/s41597-022-01330-5
HfO2_DPGEN_PRB_2021
Dataset Downloads Coming Soon Description: Approximately 28,500 configurations of hafnia (HfO2) used in the training of a DP model for the prediction of properties of various hafnia polymorphs, including transition barriers between different phases.

ColabFit ID: HfO2_DPGEN_PRB_2021__Wu-Zhang-Zhang-Liu__DS_qgi1kbtdzwmm_0
Name: HfO2_DPGEN_PRB_2021
Authors: Jing Wu, Yuzhi Zhang, Linfeng Zhang, Shi Liu
Elements: Hf, O
Number of Configurations: 28,564
Number of Elements: 2
Number of Atoms: 2,741,376

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=HfO2-dpgen
https://doi.org/10.1103/PhysRevB.103.024108
HfO2_NPJ_2020
Dataset Downloads Coming Soon Description: 6000 configurations of liquid and amorphous HfO2 generated for use with an active learning ML model.

ColabFit ID: HfO2_NPJ_2020__Sivaraman-Krishnamoorthy-Baur-Holm-Stan-Csányi-Benmore-Vázquez-Mayagoitia__DS_u40eq96ge40x_0
Name: HfO2_NPJ_2020
Authors: Ganesh Sivaraman, Anand Narayanan Krishnamoorthy, Matthias Baur, Christian Holm, Marius Stan, Gábor Csányi, Chris Benmore, Álvaro Vázquez-Mayagoitia
Elements: Hf, O
Number of Configurations: 6,000
Number of Elements: 2
Number of Atoms: 576,000

Links:
https://github.com/argonne-lcf/active-learning-md
https://doi.org/10.1038/s41524-020-00367-7
Hydrogen-induced_insulating_state_SmNiO3
Dataset Downloads Coming Soon Description: A dataset of DFT-calculated energies created to investigate the effect of hydrogen doping on the crystal structure and the electronic state in SmNiO3.Configuration sets include sets for apically and side-bonded hydrogen atoms for 1-9 hydrogen atoms.

ColabFit ID: Hydrogen-induced_insulating_state_SmNiO3__Yamauchi-Hamada__DS_r1sl5dye35kg_0
Name: Hydrogen-induced_insulating_state_SmNiO3
Authors: Kunihiko Yamauchi, Ikutaro Hamada
Elements: H, Ni, O, Sm
Number of Configurations: 3,318
Number of Elements: 4
Number of Atoms: 156,419

Links:
https://doi.org/10.24435/materialscloud:4w-qm
https://doi.org/10.48550/arXiv.2210.07656
ISO17_NC_2017
Dataset Downloads Coming Soon Description: 129 molecules of composition C7O2H10 from the QM9 dataset with 5000 conformational geometries apiece. Molecular dynamics data was simulated using the Fritz-Haber Institute ab initio simulation software.

ColabFit ID: ISO17_NC_2017__Vandermause-Xie-Lim-Owen-Kozinsky__DS_ngsaypfb9rnj_0
Name: ISO17_NC_2017
Authors: Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky
Elements: C, H, O
Number of Configurations: 640,855
Number of Elements: 3
Number of Atoms: 12,176,245

Links:
http://quantum-machine.org/datasets/
https://proceedings.neurips.cc/paper/2017/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html
In2Se3_2D_DPGEN
Dataset Downloads Coming Soon Description: Approximately 11,500 configurations of In2Se3, including monolayer (20-atom slab) and bulk (30-atom supercell) models.

ColabFit ID: In2Se3_2D_DPGEN__Wu-Bai-Huang-Ma-Liu-Liu__DS_v4e6t79tt5xh_0
Name: In2Se3_2D_DPGEN
Authors: Jing Wu, Liyi Bai, Jiawei Huang, Liyang Ma, Jian Liu, Shi Liu
Elements: In, Se
Number of Configurations: 11,523
Number of Elements: 2
Number of Atoms: 248,510

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=In2Se3-2D-dpgen
https://doi.org/10.1103/PhysRevB.104.174107
InP_JPCA2020
Dataset Downloads Coming Soon Description: This data set was used to generate a multi-element linear SNAP potential for InP, as published in Cusentino, M. A. et. al, J. Chem. Phys. (2020). Intended to produce an interatomic potential for indium phosphide capable of capturing high-energy defects that result from radiation damage cascades.

ColabFit ID: InP_JPCA2020__Cusentino-Wood-Thompson__DS_03ph25s4yepy_0
Name: InP_JPCA2020
Authors: Mary Alice Cusentino, Mitchell A. Wood, Aidan P. Thompson
Elements: In, P
Number of Configurations: 1,802
Number of Elements: 2
Number of Atoms: 106,761

Links:
https://github.com/FitSNAP/FitSNAP/tree/master/examples/InP_JPCA2020
https://doi.org/10.1021/acs.jpca.0c02450
JARVIS-Polymer-Genome
Dataset Downloads Coming Soon Description: The JARVIS-Polymer-Genome dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Polymer Genome dataset, as created for the linked publication (Huan, T., Mannodi-Kanakkithodi, A., Kim, C. et al.). Structures were curated from existing sources and the original authors' works, removing redundant, identical structures before calculations, and removing redundant datapoints after calculations were performed. Band gap energies were calculated using two different DFT functionals: rPW86 and HSE06; atomization energy was calculated using rPW86. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS-Polymer-Genome__Huan-Mannodi-Kanakkithodi-Kim-Sharma-Pilania-Ramprasad__DS_dbgckv1il6v7_0
Name: JARVIS-Polymer-Genome
Authors: Tran Doan Huan, Arun Mannodi-Kanakkithodi, Chiho Kim, Vinit Sharma, Ghanshyam Pilania, Rampi Ramprasad
Elements: Al, C, Ca, Cd, Cl, F, H, Hf, Mg, N, O, Pb, S, Sn, Ti, Zn, Zr
Number of Configurations: 1,073
Number of Elements: 17
Number of Atoms: 34,441

Links:
https://ndownloader.figshare.com/files/26809907
https://doi.org/10.1038/sdata.2016.12
JARVIS-QM9-DGL
Dataset Downloads Coming Soon Description: The JARVIS-QM9-DGL dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset, originally created as part of the datasets at quantum-machine.org, as implemented with the Deep Graph Library (DGL) Python package. Units for r2 (electronic spatial extent) are a0^2; for alpha (isotropic polarizability), a0^3; for mu (dipole moment), D; for Cv (heat capacity), cal/mol K. Units for all other properties are eV. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS-QM9-DGL__Ramakrishnan-Dral-Rupp-Lilienfeld__DS_tat5i46x3hkr_0
Name: JARVIS-QM9-DGL
Authors: Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Elements: C, F, H, N, O
Number of Configurations: 130,831
Number of Elements: 5
Number of Atoms: 2,358,210

Links:
https://ndownloader.figshare.com/files/28541196
https://doi.org/10.1038/sdata.2014.22
JARVIS_2DMatPedia
Dataset Downloads Coming Soon Description: The JARVIS-2DMatPedia dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations with 2D materials from the 2DMatPedia database, generated through two methods: a top-down exfoliation approach, using structures of bulk materials from the Materials Project database; and a bottom-up approach, replacing each element in a 2D material with another from the same group (according to column number). JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_2DMatPedia__Zhou-Shen-Costa-Persson-Huck-Lu-Ma-Chen-Tang-Feng__DS_hdv6si8yu2mv_0
Name: JARVIS_2DMatPedia
Authors: Jun Zhou, Lei Shen, Miguel Dias Costa, Kristin A. Persson, Shyue Ping Ong", "Patrick Huck, Yunhao Lu, Xiaoyang Ma, Yiming Chen, Hanmei Tang, Yuan Ping Feng
Elements: Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 6,351
Number of Elements: 83
Number of Atoms: 66,295

Links:
https://ndownloader.figshare.com/files/26789006
https://doi.org/10.1038/s41597-019-0097-3
JARVIS_AGRA_CHO
Dataset Downloads Coming Soon Description: The JARVIS_AGRA_CHO dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_AGRA_CHO__Chen-Gariepy-Chen-Yao-Anand-Liu-Feugmo-Tamblyn-Singh__DS_z3s0qui5vg5c_0
Name: JARVIS_AGRA_CHO
Authors: Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh
Elements: C, Co, Cu, Fe, H, Mo, Ni, O
Number of Configurations: 216
Number of Elements: 8
Number of Atoms: 14,472

Links:
https://figshare.com/ndownloader/files/41923284
https://doi.org/10.1021/acscatal.2c03675
JARVIS_AGRA_CO
Dataset Downloads Coming Soon Description: The JARVIS_AGRA_CO dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_AGRA_CO__Chen-Gariepy-Chen-Yao-Anand-Liu-Feugmo-Tamblyn-Singh__DS_1ynp8v7d3er5_0
Name: JARVIS_AGRA_CO
Authors: Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh
Elements: C, Co, Cu, Fe, Mo, Ni, O
Number of Configurations: 194
Number of Elements: 7
Number of Atoms: 12,804

Links:
https://figshare.com/ndownloader/files/41923284
https://doi.org/10.1021/acscatal.2c03675
JARVIS_AGRA_COOH
Dataset Downloads Coming Soon Description: The JARVIS_AGRA_COOH dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_AGRA_COOH__Chen-Gariepy-Chen-Yao-Anand-Liu-Feugmo-Tamblyn-Singh__DS_bggn2p6qw8zv_0
Name: JARVIS_AGRA_COOH
Authors: Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh
Elements: C, Co, Cu, Fe, H, Mo, Ni, O
Number of Configurations: 280
Number of Elements: 8
Number of Atoms: 19,040

Links:
https://figshare.com/ndownloader/files/41923284
https://doi.org/10.1021/acscatal.2c03675
JARVIS_AGRA_O
Dataset Downloads Coming Soon Description: The JARVIS_AGRA_O dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the training set for the oxygen reduction reaction (ORR) dataset from Batchelor et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. Bulk calculations were performed with k-point = 8 x 8 x 4. Training adsorption energies were calculated on slabs, k-point = 4 x 4 x 1, while testing energies used k-point = 3 x 3 x 1. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_AGRA_O__Batchelor-Pedersen-Winther-Castelli-Jacobsen-Rossmeisl__DS_ikx8oaixaoz0_0
Name: JARVIS_AGRA_O
Authors: Thomas A.A. Batchelor, Jack K. Pedersen, Simon H. Winther, Ivano E. Castelli, Karsten W. Jacobsen, Jan Rossmeisl
Elements: Ir, O, Pd, Pt, Rh, Ru
Number of Configurations: 1,000
Number of Elements: 6
Number of Atoms: 17,000

Links:
https://figshare.com/ndownloader/files/41923284
https://doi.org/10.1016/j.joule.2018.12.015
JARVIS_AGRA_OH
Dataset Downloads Coming Soon Description: The JARVIS_AGRA_OH dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the training set for the oxygen reduction reaction (ORR) dataset from Batchelor et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. Bulk calculations were performed with k-point = 8 x 8 x 4. Training adsorption energies were calculated on slabs, k-point = 4 x 4 x 1, while testing energies used k-point = 3 x 3 x 1. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_AGRA_OH__Batchelor-Pedersen-Winther-Castelli-Jacobsen-Rossmeisl__DS_osavzsrz5mgf_0
Name: JARVIS_AGRA_OH
Authors: Thomas A.A. Batchelor, Jack K. Pedersen, Simon H. Winther, Ivano E. Castelli, Karsten W. Jacobsen, Jan Rossmeisl
Elements: H, Ir, O, Pd, Pt, Rh, Ru
Number of Configurations: 877
Number of Elements: 7
Number of Atoms: 15,786

Links:
https://figshare.com/ndownloader/files/41923284
https://doi.org/10.1016/j.joule.2018.12.015
JARVIS_ALIGNN_FF
Dataset Downloads Coming Soon Description: The JARVIS_ALIGNN_FF dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset is a subset of the JARVIS DFT dataset, filtered to contain just the first, last, middle, maximum energy and minimum energy structures. Additionally, calculation run snapshots are filtered for uniqueness, and the dataset contains only perfect structures. DFT energies, stresses and forces in this dataset were used to train an atomisitic line graph neural network (ALIGNN)-based FF model. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_ALIGNN_FF__Choudhary-DeCost-Major-Butler-Thiyagalingam-Tavazza__DS_720rbshv96l1_0
Name: JARVIS_ALIGNN_FF
Authors: Kamal Choudhary, Brian DeCost, Lily Major, Keith Butler, Jeyan Thiyagalingam, Francesca Tavazza
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 306,411
Number of Elements: 89
Number of Atoms: 3,193,703

Links:
https://ndownloader.figshare.com/files/38522315
https://doi.org/10.1039/D2DD00096B
JARVIS_C2DB
Dataset Downloads Coming Soon Description: The JARVIS-C2DB dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations from the Computational 2D Database (C2DB), which contains a variety of properties for 2-dimensional materials across more than 30 differentcrystal structures. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_C2DB__Haastrup-Strange-Pandey-Deilmann-Schmidt-Hinsche-Gjerding-Torelli-Larsen-Riis-Jensen-Gath-Jacobsen-Mortensen-Olsen-Thygesen__DS_8hgxhsfkcfa7_0
Name: JARVIS_C2DB
Authors: Sten Haastrup, Mikkel Strange, Mohnish Pandey, Thorsten Deilmann, Per S Schmidt, Nicki F Hinsche, Morten N Gjerding, Daniele Torelli, Peter M Larsen, Anders C Riis-Jensen, Jakob Gath, Karsten W Jacobsen, Jens Jørgen Mortensen, Thomas Olsen, Kristian S Thygesen
Elements: Ag, Al, As, Au, B, Ba, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 3,520
Number of Elements: 61
Number of Atoms: 17,990

Links:
https://ndownloader.figshare.com/files/28682010
https://doi.org/10.1088/2053-1583/aacfc1
JARVIS_CFID_3D_8_18_2022
Dataset Downloads Coming Soon Description: The JARVIS_CFID_3D_8_18_2022 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_CFID_3D_8_18_2022__Choudhary-Garrity-Reid-DeCost-Biacchi-Walker-Trautt-Hattrick-Simpers-Kusne-Centrone-Davydov-Jiang-Pachter-Cheon-Reed-Agrawal-Qian-Sharma-Zhuang-Kalinin-Sumpter-Pilania-Acar-Mandal-Haule-Vanderbilt-Rabe-Tavazza__DS_np1noc8iip47_0
Name: JARVIS_CFID_3D_8_18_2022
Authors: Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 55,631
Number of Elements: 89
Number of Atoms: 561,738

Links:
https://doi.org/10.6084/m9.figshare.6815699
https://doi.org/10.1038/s41524-020-00440-1
JARVIS_CFID_OQMD
Dataset Downloads Coming Soon Description: The JARVIS_CFID_OQMD dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Open Quantum Materials Database (OQMD), created to hold information about the electronic structure and stability of organic materials for the purpose of aiding in materials discovery. Calculations were performed at the DFT level of theory, using the PAW-PBE functional implemented by VASP. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_CFID_OQMD__Kirklin-Saal-Meredig-Thompson-Doak-Aykol-Rühl-Wolverton__DS_u8strp7hm0cy_0
Name: JARVIS_CFID_OQMD
Authors: Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 459,991
Number of Elements: 89
Number of Atoms: 2,366,255

Links:
https://ndownloader.figshare.com/files/24981170
https://doi.org/10.1038/npjcompumats.2015.10
JARVIS_DFT_2D_3_12_2021
Dataset Downloads Coming Soon Description: The DFT-2D-3-12-2021 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 2D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_DFT_2D_3_12_2021__Choudhary-Garrity-Reid-DeCost-Biacchi-Walker-Trautt-Hattrick-Simpers-Kusne-Centrone-Davydov-Jiang-Pachter-Cheon-Reed-Agrawal-Qian-Sharma-Zhuang-Kalinin-Sumpter-Pilania-Acar-Mandal-Haule-Vanderbilt-Rabe-Tavazza__DS_4ml1yrigmar0_0
Name: JARVIS_DFT_2D_3_12_2021
Authors: Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
Elements: Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, F, Fe, Ga, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Number of Configurations: 1,074
Number of Elements: 81
Number of Atoms: 7,903

Links:
https://ndownloader.figshare.com/files/26808917
https://doi.org/10.1038/s41524-020-00440-1
JARVIS_DFT_3D_12_12_2022
Dataset Downloads Coming Soon Description: The DFT_3D_12_12_2022 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_DFT_3D_12_12_2022__Choudhary-Garrity-Reid-DeCost-Biacchi-Walker-Trautt-Hattrick-Simpers-Kusne-Centrone-Davydov-Jiang-Pachter-Cheon-Reed-Agrawal-Qian-Sharma-Zhuang-Kalinin-Sumpter-Pilania-Acar-Mandal-Haule-Vanderbilt-Rabe-Tavazza__DS_t02h9eraswtv_0
Name: JARVIS_DFT_3D_12_12_2022
Authors: Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 75,909
Number of Elements: 89
Number of Atoms: 785,250

Links:
https://doi.org/10.6084/m9.figshare.6815699
https://doi.org/10.1038/s41524-020-00440-1
JARVIS_DFT_3D_8_18_2021
Dataset Downloads Coming Soon Description: The JARVIS_DFT_3D_8_18_2021 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_DFT_3D_8_18_2021__Choudhary-Garrity-Reid-DeCost-Biacchi-Walker-Trautt-Hattrick-Simpers-Kusne-Centrone-Davydov-Jiang-Pachter-Cheon-Reed-Agrawal-Qian-Sharma-Zhuang-Kalinin-Sumpter-Pilania-Acar-Mandal-Haule-Vanderbilt-Rabe-Tavazza__DS_soziqjohq6hm_0
Name: JARVIS_DFT_3D_8_18_2021
Authors: Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 55,631
Number of Elements: 89
Number of Atoms: 561,741

Links:
https://doi.org/10.6084/m9.figshare.6815699
https://doi.org/10.1038/s41524-020-00440-1
JARVIS_EPC_2D
Dataset Downloads Coming Soon Description: The JARVIS_EPC_2D dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations sourced from the JARVIS-DFT-2D dataset, rerelaxed with Quantum ESPRESSO. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_EPC_2D__Wines-Choudhary-Biacchi-Garrity-Tavazza__DS_7jji22dy5hix_0
Name: JARVIS_EPC_2D
Authors: Daniel Wines, Kamal Choudhary, Adam J. Biacchi, evin F. Garrity, Francesca Tavazza
Elements: Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cl, Co, Cr, Cu, F, Fe, Ga, Ge, H, Hf, I, In, Ir, K, La, Li, Mg, Mo, N, Na, Nb, Ni, O, P, Pb, Pd, Pt, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 161
Number of Elements: 55
Number of Atoms: 788

Links:
https://figshare.com/ndownloader/files/38950433
https://doi.org/10.1021/acs.nanolett.2c04420
JARVIS_MEGNet
Dataset Downloads Coming Soon Description: The JARVIS-MEGNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations with 3D materials properties from the 2018 version of Materials Project, as used in the training of the MEGNet ML model. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_MEGNet__Chen-Ye-Zuo-Zheng-Ong__DS_9yr94hhj1k33_0
Name: JARVIS_MEGNet
Authors: Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 69,234
Number of Elements: 89
Number of Atoms: 2,070,948

Links:
https://ndownloader.figshare.com/files/26724977
https://doi.org/10.1021/acs.chemmater.9b01294
JARVIS_MEGNet2
Dataset Downloads Coming Soon Description: The JARVIS-MEGNet2 dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains 133K materials with formation energy from the Materials Project, as used in the training of the MEGNet ML model. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_MEGNet2__Chen-Ye-Zuo-Zheng-Ong__DS_5ar73nonq6l1_0
Name: JARVIS_MEGNet2
Authors: Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 133,420
Number of Elements: 89
Number of Atoms: 3,880,497

Links:
https://ndownloader.figshare.com/files/28332741
https://doi.org/10.1021/acs.chemmater.9b01294
JARVIS_Materials_Project_2020
Dataset Downloads Coming Soon Description: The JARVIS_Materials_Project_2020 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains 127,000 configurations of 3D materials from the Materials Project database. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_Materials_Project_2020__Jain-Ong-Hautier-Chen-Richards-Dacek-Cholia-Gunter-Skinner-Ceder-Persson__DS_gpsibs9f47k4_0
Name: JARVIS_Materials_Project_2020
Authors: Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 126,335
Number of Elements: 89
Number of Atoms: 3,725,727

Links:
https://ndownloader.figshare.com/files/26791259
https://doi.org/10.1063/1.4812323
JARVIS_Materials_Project_84K
Dataset Downloads Coming Soon Description: The JARVIS_Materials_Project_84K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains 84,000 configurations of 3D materials from the Materials Project database. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_Materials_Project_84K__Jain-Ong-Hautier-Chen-Richards-Dacek-Cholia-Gunter-Skinner-Ceder-Persson__DS_1s8172fnm2ct_0
Name: JARVIS_Materials_Project_84K
Authors: Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 83,426
Number of Elements: 89
Number of Atoms: 2,339,932

Links:
https://ndownloader.figshare.com/files/24979850
https://doi.org/10.1063/1.4812323
JARVIS_OMDB
Dataset Downloads Coming Soon Description: The JARVIS_OMDB dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Organic Materials Database (OMDB): a dataset of 12,500 crystal materials for the purpose of training models for the prediction of properties for complex and lattice-periodic organic crystals with large numbers of atoms per unit cell. Dataset covers 69 space groups, 65 elements; averages 82 atoms per unit cell. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_OMDB__Olsthoorn-Geilhufe-Borysov-Balatsky__DS_hrt0twm503tr_0
Name: JARVIS_OMDB
Authors: Bart Olsthoorn, R. Matthias Geilhufe, Stanislav S. Borysov, Alexander V. Balatsky
Elements: Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, U, V, W, Y, Zn, Zr
Number of Configurations: 12,497
Number of Elements: 65
Number of Atoms: 1,061,362

Links:
https://ndownloader.figshare.com/files/28501761
https://doi.org/10.1002/qute.201900023
JARVIS_OQMD_no_CFID
Dataset Downloads Coming Soon Description: The JARVIS_OQMD_no_CFID dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Open Quantum Materials Database (OQMD), created to hold information about the electronic structure and stability of organic materials for the purpose of aiding in materials discovery. Calculations were performed at the DFT level of theory, using the PAW-PBE functional implemented by VASP. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_OQMD_no_CFID__Kirklin-Saal-Meredig-Thompson-Doak-Aykol-Rühl-Wolverton__DS_untxw8nljf92_0
Name: JARVIS_OQMD_no_CFID
Authors: Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 811,782
Number of Elements: 89
Number of Atoms: 5,017,701

Links:
https://ndownloader.figshare.com/files/26790182
https://doi.org/10.1038/npjcompumats.2015.10
JARVIS_Open_Catalyst_100K
Dataset Downloads Coming Soon Description: The JARVIS_Open_Catalyst_100K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the 100K training, rest validation and test dataset from the Open Catalyst Project (OCP). JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_Open_Catalyst_100K__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_armtbsouma25_0
Name: JARVIS_Open_Catalyst_100K
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 124,943
Number of Elements: 56
Number of Atoms: 9,720,870

Links:
https://figshare.com/ndownloader/files/40902845
https://doi.org/10.1021/acscatal.0c04525
JARVIS_Open_Catalyst_10K
Dataset Downloads Coming Soon Description: The JARVIS_Open_Catalyst_10K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the 10K training, rest validation and test dataset from the Open Catalyst Project (OCP). JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_Open_Catalyst_10K__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_6dcul0b17524_0
Name: JARVIS_Open_Catalyst_10K
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 34,943
Number of Elements: 56
Number of Atoms: 2,720,316

Links:
https://figshare.com/ndownloader/files/40566122
https://doi.org/10.1021/acscatal.0c04525
JARVIS_Open_Catalyst_All
Dataset Downloads Coming Soon Description: The JARVIS_Open_Catalyst_All dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the Open Catalyst Project (OCP) 460328 training, rest validation and test dataset. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_Open_Catalyst_All__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_3u17g7ukggi5_0
Name: JARVIS_Open_Catalyst_All
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 485,269
Number of Elements: 56
Number of Atoms: 37,728,919

Links:
https://figshare.com/ndownloader/files/40902845
https://doi.org/10.1021/acscatal.0c04525
JARVIS_QE_TB
Dataset Downloads Coming Soon Description: The QE-TB dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations generated in Quantum ESPRESSO. JARVIS is a set of tools and datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_QE_TB__Garrity-Choudhary__DS_e471qdt7c6db_0
Name: JARVIS_QE_TB
Authors: Kevin F. Garrity, Kamal Choudhary
Elements: Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 829,576
Number of Elements: 64
Number of Atoms: 2,578,920

Links:
https://ndownloader.figshare.com/files/29070555
https://doi.org/10.1103/PhysRevMaterials.7.044603
JARVIS_QM9_STD_JCTC
Dataset Downloads Coming Soon Description: The JARVIS_QM9_STD_JCTC dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset, originally created as part of the datasets at quantum-machine.org. Units for r2 (electronic spatial extent) are a^2; for alpha (isotropic polarizability), a^3; for mu (dipole moment), D; for Cv (heat capacity), cal/mol K. Units for all other properties are eV. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.For the first iteration of DFT calculations, Gaussian 09's default electronic and geometry thresholds have been used for all molecules. For those molecules which failed to reach SCF convergence ultrafine grids have been invoked within a second iteration for evaluating the XC energy contributions. Within a third iteration on the remaining unconverged molecules, we identified those which had relaxed to saddle points, and further tightened the SCF criteria using the keyword scf(maxcycle=200, verytight). All those molecules which still featured imaginary frequencies entered the fourth iteration using keywords, opt(calcfc, maxstep=5, maxcycles=1000). calcfc constructs a Hessian in the first step of the geometry relaxation for eigenvector following. Within the fifth and final iteration, all molecules which still failed to reach convergence, have subsequently been converged using opt(calcall, maxstep=1, maxcycles=1000)

ColabFit ID: JARVIS_QM9_STD_JCTC__Ramakrishnan-Dral-Rupp-Lilienfeld__DS_jz1q9juw7ycj_0
Name: JARVIS_QM9_STD_JCTC
Authors: Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Elements: C, F, H, N, O
Number of Configurations: 130,829
Number of Elements: 5
Number of Atoms: 2,359,192

Links:
https://ndownloader.figshare.com/files/28715319
https://doi.org/10.1038/sdata.2014.22
JARVIS_QMOF
Dataset Downloads Coming Soon Description: The JARVIS_QMOF dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Quantum Metal-Organic Frameworks (QMOF) dataset, comprising quantum-chemical properties for >14,000 experimentally synthesized MOFs. QMOF contains "DFT-ready" data: filtered to remove omitted, overlapping, unbonded or deleted atoms, along with other kinds of problematic structures commented on in the literature. Data were generated via high-throughput DFT workflow, at the PBE-D3(BJ) level of theory using VASP software. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_QMOF__Rosen-Iyer-Ray-Yao-Aspuru-Guzik-Gagliardi-Notestein-Snurr__DS_221svb9fxfk7_0
Name: JARVIS_QMOF
Authors: Andrew S. Rosen, Shaelyn M. Iyer, Debmalya Ray, Zhenpeng Yao, Alán Aspuru-Guzik, Laura Gagliardi, Justin M. Notestein, Randall Q. Snurr
Elements: Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Number of Configurations: 20,425
Number of Elements: 79
Number of Atoms: 2,321,633

Links:
https://figshare.com/ndownloader/files/30972640
https://doi.org/10.1016/j.matt.2021.02.015
JARVIS_SNUMAT
Dataset Downloads Coming Soon Description: The JARVIS_SNUMAT dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains band gap data for >10,000 materials, computed using a hybrid functional and considering the stable magnetic ordering. Structure relaxation and band edges are obtained using the PBE XC functional; band gap energy is subsequently obtained using the HSE06 hybrid functional. Optical and fundamental band gap energies are included. Some gap energies are recalculated by including spin-orbit coupling. These are noted in the band gap metadata as "SOC=true". JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_SNUMAT__Kim-Lee-Hong-Yoon-An-Lee-Jeong-Yoo-Kang-Youn-Han__DS_1nbddfnjxbjc_0
Name: JARVIS_SNUMAT
Authors: Sangtae Kim, Miso Lee, Changho Hong, Youngchae Yoon, Hyungmin An, Dongheon Lee, Wonseok Jeong, Dongsun Yoo, Youngho Kang, Yong Youn, Seungwu Han
Elements: Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, F, Fe, Ga, Ge, H, He, Hf, Hg, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ne, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Th, Ti, Tl, V, W, Xe, Y, Zn, Zr
Number of Configurations: 10,481
Number of Elements: 73
Number of Atoms: 216,749

Links:
https://ndownloader.figshare.com/files/38521736
https://doi.org/10.1038/s41597-020-00723-8
JARVIS_TinNet_N
Dataset Downloads Coming Soon Description: The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-N dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {100}-terminated Pt-based bimetallic surfaces doped with a third element. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_TinNet_N__Wang-Pillai-Wang-Achenie-Xin__DS_ffsgl0wufoft_0
Name: JARVIS_TinNet_N
Authors: Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin
Elements: Ag, Au, Cd, Co, Cr, Cu, Fe, H, Hf, Ir, Mn, Mo, N, Nb, Ni, O, Os, Pd, Pt, Re, Rh, Ru, Sc, Tc, V, W, Zn
Number of Configurations: 329
Number of Elements: 27
Number of Atoms: 6,251

Links:
https://figshare.com/ndownloader/files/40934285
https://doi.org/10.1038/s41467-021-25639-8
JARVIS_TinNet_O
Dataset Downloads Coming Soon Description: The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-O dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {111}-terminated metal surfaces. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_TinNet_O__Wang-Pillai-Wang-Achenie-Xin__DS_ok9dbnj53zih_0
Name: JARVIS_TinNet_O
Authors: Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin
Elements: Ag, Al, Au, Bi, Cd, Co, Cr, Cu, Fe, Ga, Hf, In, Ir, La, Mn, Mo, Nb, Ni, O, Os, Pb, Pd, Pt, Re, Rh, Ru, Sc, Sn, Ta, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 747
Number of Elements: 36
Number of Atoms: 12,699

Links:
https://figshare.com/ndownloader/files/40934285
https://doi.org/10.1038/s41467-021-25639-8
JARVIS_TinNet_OH
Dataset Downloads Coming Soon Description: The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-OH dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {111}-terminated metal surfaces. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_TinNet_OH__Wang-Pillai-Wang-Achenie-Xin__DS_mk2e641eop9w_0
Name: JARVIS_TinNet_OH
Authors: Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin
Elements: Ag, Al, Au, Bi, Cd, Co, Cr, Cu, Fe, Ga, H, Hf, In, Ir, La, Mn, Mo, Nb, Ni, O, Os, Pb, Pd, Pt, Re, Rh, Ru, Sc, Sn, Ta, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 748
Number of Elements: 37
Number of Atoms: 13,464

Links:
https://figshare.com/ndownloader/files/40934285
https://doi.org/10.1038/s41467-021-25639-8
JARVIS_mlearn
Dataset Downloads Coming Soon Description: The JARVIS_mlearn dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Organic Materials Database (OMDB): a dataset of 12,500 crystal materials for the purpose of training models for the prediction of properties for complex and lattice-periodic organic crystals with large numbers of atoms per unit cell. Dataset covers 69 space groups, 65 elements; averages 82 atoms per unit cell. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.

ColabFit ID: JARVIS_mlearn__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_g9sra4y8efji_0
Name: JARVIS_mlearn
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Cu, Ge, Li, Mo, Ni, Si
Number of Configurations: 1,566
Number of Elements: 6
Number of Atoms: 115,742

Links:
https://figshare.com/ndownloader/files/40424156
https://doi.org/10.1021/acs.jpca.9b08723
LiGePS_SSE_PBE
Dataset Downloads Coming Soon Description: Approximately 6,500 configurations of Li10GeP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiGePS datasets from this source. The other uses the PBEsol functional, rather than the PBE functional.

ColabFit ID: LiGePS_SSE_PBE__Huang-Zhang-Wang-Zhao-Cheng-E__DS_4lev0d7cs1yl_0
Name: LiGePS_SSE_PBE
Authors: Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
Elements: Ge, Li, P, S
Number of Configurations: 6,550
Number of Elements: 4
Number of Atoms: 1,479,000

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=LiGePS-SSE-PBE
https://doi.org/10.1063/5.0041849
LiGePS_SSE_PBEsol
Dataset Downloads Coming Soon Description: Approximately 2,800 configurations of Li10GeP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiGePS datasets from this source. The other uses the PBE functional, rather than the PBEsol functional.

ColabFit ID: LiGePS_SSE_PBEsol__Huang-Zhang-Wang-Zhao-Cheng-E__DS_dhtzh0y2108p_0
Name: LiGePS_SSE_PBEsol
Authors: Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
Elements: Ge, Li, P, S
Number of Configurations: 2,835
Number of Elements: 4
Number of Atoms: 504,350

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=LiGePS-SSE-PBEsol
https://doi.org/10.1063/5.0041849
LiSiPS_SSE_PBE
Dataset Downloads Coming Soon Description: Approximately 9,100 configurations of Li10SiP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiSiPS datasets from this source. The other uses the PBEsol functional, rather than the PBE functional.

ColabFit ID: LiSiPS_SSE_PBE__Huang-Zhang-Wang-Zhao-Cheng-E__DS_9p3sip4yhiju_0
Name: LiSiPS_SSE_PBE
Authors: Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
Elements: Li, P, S, Si
Number of Configurations: 9,169
Number of Elements: 4
Number of Atoms: 2,101,550

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=LiSiPS-SSE-PBE
https://doi.org/10.1063/5.0041849
LiSiPS_SSE_PBEsol
Dataset Downloads Coming Soon Description: Approximately 2,300 configurations of Li10SiP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiSiPS datasets from this source. The other uses the PBE functional, rather than the PBEsol functional.

ColabFit ID: LiSiPS_SSE_PBEsol__Huang-Zhang-Wang-Zhao-Cheng-E__DS_th9w66lyxspv_0
Name: LiSiPS_SSE_PBEsol
Authors: Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
Elements: Li, P, S, Si
Number of Configurations: 2,357
Number of Elements: 4
Number of Atoms: 313,150

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=LiSiPS-SSE-PBEsol
https://doi.org/10.1063/5.0041849
LiTiO_Science_2020
Dataset Downloads Coming Soon Description: This dataset contains configurations of lithium titanate from the publication Kinetic Pathways of ionic transport in fast-charging lithium titanate. In order to understand the origin of various EELS (electron energy-loss spectroscopy) spectra features, EELS spectra were simulated using the Vienna Ab initio Simulation (VASP) package. For a specific Li in a given configuration, this is done by calculating the DOS and integrated DOS considering a Li core-hole on the position of the specific Li and calculating the EELS based on the DOS. The minimum energy paths (MEP) and migration energy of Li were calculated in various compositions, including Li4Ti5O12 with an additional Li carrier, Li5Ti5O12 with an additional Li carrier, and Li7Ti5O12 with a Li vacancy carrier.

ColabFit ID: LiTiO_Science_2020__Chen-Seo__DS_0pv4lpmx2ov3_0
Name: LiTiO_Science_2020
Authors: Tina Chen, Dong-hwa Seo
Elements: Be, Li, O, Ti
Number of Configurations: 849
Number of Elements: 4
Number of Atoms: 150,105

Links:
https://doi.org/10.24435/materialscloud:2020.0006/v1
https://doi.org/10.1126/science.aax3520
MD22_AT_AT
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of AT-AT DNA base pairs from the MD22 benchmark set. {DESC}

ColabFit ID: MD22_AT_AT__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_eqt5dbhsmm68_0
Name: MD22_AT_AT
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H, N, O
Number of Configurations: 20,001
Number of Elements: 4
Number of Atoms: 1,200,060

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MD22_AT_AT_CG_CG
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of AT-AT-CG-CG DNA base pairs from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.

ColabFit ID: MD22_AT_AT_CG_CG__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_rx1ei5q0x9gy_0
Name: MD22_AT_AT_CG_CG
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H, N, O
Number of Configurations: 10,153
Number of Elements: 4
Number of Atoms: 1,198,054

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MD22_Ac_Ala3_NHMe
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of the 42-atom tetrapeptide Ac-Ala3-NHMe from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.

ColabFit ID: MD22_Ac_Ala3_NHMe__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_l6awe3jnivnm_0
Name: MD22_Ac_Ala3_NHMe
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H, N, O
Number of Configurations: 85,109
Number of Elements: 4
Number of Atoms: 3,574,578

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MD22_DHA
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of DHA (docosahexaenoic acid) from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.

ColabFit ID: MD22_DHA__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_xwmthngerf71_0
Name: MD22_DHA
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H, O
Number of Configurations: 69,753
Number of Elements: 3
Number of Atoms: 3,906,168

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MD22_buckyball_catcher
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of the buckyball-catcher supramolecule from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.

ColabFit ID: MD22_buckyball_catcher__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_7g8q68fszjvs_0
Name: MD22_buckyball_catcher
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H
Number of Configurations: 6,102
Number of Elements: 2
Number of Atoms: 903,096

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MD22_double_walled_nanotube
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of the double-walled nanotube supramolecule from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.

ColabFit ID: MD22_double_walled_nanotube__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_d92bnafzwynr_0
Name: MD22_double_walled_nanotube
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H
Number of Configurations: 5,032
Number of Elements: 2
Number of Atoms: 1,861,840

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MD22_stachyose
Dataset Downloads Coming Soon Description: Dataset containing MD trajectories of the tetrasaccharide stachyose from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.

ColabFit ID: MD22_stachyose__Chmiela-Vassilev-Galindo-Unke-Kabylda-Sauceda-Tkatchenko-Müller__DS_87ojug1k96ef_0
Name: MD22_stachyose
Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
Elements: C, H, O
Number of Configurations: 27,272
Number of Elements: 3
Number of Atoms: 2,372,664

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.adf0873
MISPR
Dataset Downloads Coming Soon Description: Example dataset for MISPR (Materials Informatics for Structure-Property Relationships) materials science simulation software, with DFT-calculated configuration properties for three different MISPR workflows: nuclear magnetic resonance (NMR) chemical shifts, electrostatic partial charges (ESP) and bond dissociation energies (BDE).

ColabFit ID: MISPR__Atwi-Bliss-Makeev-Rajput__DS_rsao7xrpu9ig_0
Name: MISPR
Authors: Rasha Atwi, Matthew Bliss, Maxim Makeev, Nav Nidhi Rajput
Elements: C, Cl, F, H, N, O, P, S, Si
Number of Configurations: 503
Number of Elements: 9
Number of Atoms: 8,996

Links:
https://doi.org/10.1038/s41598-022-20009-w
https://github.com/rashatwi/mispr-dataset
MTPu_2023
Dataset Downloads Coming Soon Description: A comprehensive database generated using density functional theory simulations, encompassing a wide range of crystal structures, point defects, extended defects, and disordered structure.

ColabFit ID: MTPu_2023__Zongo-Sun-Ouellet-Plamondon-Beland__DS_326i4urabisb_0
Name: MTPu_2023
Authors: Karim Zongo, Hao Sun, Claudiane Ouellet-Plamondon, Laurent Karim Beland
Elements: O, Si
Number of Configurations: 1,062
Number of Elements: 2
Number of Atoms: 71,595

Links:
https://gitlab.com/Kazongogit/MTPu
https://doi.org/10.48550/arXiv.2311.15170
Matbench_mp_e_form
Dataset Downloads Coming Soon Description: Matbench v0.1 test dataset for predicting DFT formation energy from structure. Adapted from Materials Project database. Entries having formation energy more than 2.5eV and those containing noble gases are removed. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.

ColabFit ID: Matbench_mp_e_form__Dunn-Wang-Ganose-Dopp-Jain__DS_5drebe4tktiu_0
Name: Matbench_mp_e_form
Authors: Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain
Elements: Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Number of Configurations: 132,752
Number of Elements: 84
Number of Atoms: 3,869,658

Links:
https://matbench.materialsproject.org/
https://doi.org/10.1038/s41524-020-00406-3
Matbench_mp_gap
Dataset Downloads Coming Soon Description: The Matbench_mp_gap dataset is a Matbench v0.1 test dataset for predicting DFT PBE band gap from structure, adapted from the Materials Project database. Entries having a formation energy (or energy above the convex hull) greater than 150meV and those containing noble gases have been removed. Retrieved April 2, 2019. Refer to the Automatminer/Matbench publication for more details. This dataset contains band gap as calculated by PBE DFT from the Materials Project, in eV. Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.

ColabFit ID: Matbench_mp_gap__Dunn-Wang-Ganose-Dopp-Jain__DS_6lq2f26dluql_0
Name: Matbench_mp_gap
Authors: Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain
Elements: Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Number of Configurations: 106,113
Number of Elements: 84
Number of Atoms: 3,184,984

Links:
https://matbench.materialsproject.org/
https://doi.org/10.1039/C2EE22341D
Matbench_perovskites
Dataset Downloads Coming Soon Description: The Matbench_perovskites dataset is a Matbench v0.1 test dataset for predicting formation energy from crystal structure. Adapted from an original dataset generated by Castelli et al. Refer to the Automatminer/Matbench publication for more details. This dataset contains the energy of formation of the entire 5-atom perovskite cell in eV as calculated by RPBE GGA-DFT. Note the reference state for oxygen was computed from oxygen's chemical potential in water vapor, not as oxygen molecules, to reflect the application for which these perovskites were studied. Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.

ColabFit ID: Matbench_perovskites__Dunn-Wang-Ganose-Dopp-Jain__DS_sinisy47w8ff_0
Name: Matbench_perovskites
Authors: Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain
Elements: Ag, Al, As, Au, B, Ba, Be, Bi, Ca, Cd, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, Hf, Hg, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 18,928
Number of Elements: 56
Number of Atoms: 94,640

Links:
https://matbench.materialsproject.org/
https://doi.org/10.1039/C2EE22341D
Materials_Project
Dataset Downloads Coming Soon Description: Configurations from the Materials Project database: an online resource with the goal of computing properties of all inorganic materials.

ColabFit ID: Materials_Project__Jain-Ong-Hautier-Chen-Richards-Dacek-Cholia-Gunter-Skinner-Ceder-Persson__DS_pv1f3dlo5dsc_0
Name: Materials_Project
Authors: Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson
Elements: Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Number of Configurations: 6,342,530
Number of Elements: 89
Number of Atoms: 200,016,161

Links:
https://materialsproject.org
https://doi.org/10.1063/1.4812323
Mg_edmonds_2022
Dataset Downloads Coming Soon Description: 16748 configurations of magnesium with gathered energy, stress and forces at the DFT level of theory.

ColabFit ID: Mg_edmonds_2022__Poul__DS_caktb6z8yiy7_0
Name: Mg_edmonds_2022
Authors: Marvin Poul
Elements: Mg
Number of Configurations: 16,874
Number of Elements: 1
Number of Atoms: 78,617

Links:
https://doi.org/10.17617/3.A3MB7Z
https://doi.org/10.1103/PhysRevB.107.104103
MoNbTaVW_PRB2021
Dataset Downloads Coming Soon Description: This dataset was originally designed to fit a GAP model for the Mo-Nb-Ta-V-W quinary system that was used to study segregation and defects in the body-centered-cubic refractory high-entropy alloy MoNbTaVW.

ColabFit ID: MoNbTaVW_PRB2021__Byggmästar-Nordlund-Djurabekova__DS_0shp3qrqk9k9_0
Name: MoNbTaVW_PRB2021
Authors: Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
Elements: Mo, Nb, Ta, V, W
Number of Configurations: 2,329
Number of Elements: 5
Number of Atoms: 127,913

Links:
https://doi.org/10.23729/1b845398-5291-4447-b417-1345acdd2eae
https://doi.org/10.1103/PhysRevB.104.104101
Mo_PRM2019
Dataset Downloads Coming Soon Description: This dataset was designed to enable machine learning of Mo elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.

ColabFit ID: Mo_PRM2019__Byggmästar-Nordlund-Djurabekova__DS_5aeg7va6k305_0
Name: Mo_PRM2019
Authors: Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
Elements: Mo
Number of Configurations: 3,785
Number of Elements: 1
Number of Atoms: 45,667

Links:
https://gitlab.com/acclab/gap-data/-/tree/master/Mo
https://doi.org/10.1103/PhysRevMaterials.4.093802
NDSC_TUT_2022
Dataset Downloads Coming Soon Description: 500 configurations of Mg2 for MD prediction using a model fitted on Al, W, Mg and Si.

ColabFit ID: NDSC_TUT_2022__Allen-Bartok__DS_oqqwzogut1on_0
Name: NDSC_TUT_2022
Authors: Connor Allen, Albert P. Bartok
Elements: Mg
Number of Configurations: 500
Number of Elements: 1
Number of Atoms: 1,000

Links:
https://github.com/ConnorSA/ndsc_tut
https://doi.org/10.48550/arXiv.2207.11828
NENCI-2021
Dataset Downloads Coming Soon Description: NENCI-2021 is a database of approximately 8000 benchmark Non-Equilibirum Non-Covalent Interaction (NENCI) energies performed on molecular dimers;intermolecular complexes of biological and chemical relevance with a particular emphasis on close intermolecular contacts. Based on dimersfrom the S101 database.

ColabFit ID: NENCI-2021__Sparrow-Ernst-Joo-Lao-Jr__DS_0j2smy6relq0_0
Name: NENCI-2021
Authors: Zachary M. Sparrow, Brian G. Ernst, Paul T. Joo, Ka Un Lao, Robert A. DiStasio, Jr
Elements: Br, C, Cl, F, H, Li, N, Na, O, P, S
Number of Configurations: 7,763
Number of Elements: 11
Number of Atoms: 129,402

Links:
https://pubs.aip.org/jcp/article-supplement/199609/zip/184303_1_supplements/
https://doi.org/10.1063/5.0068862
NEP_PRB_2021
Dataset Downloads Coming Soon Description: Approximately 7,000 distinct configurations of 2D-silicene, silicon, and PbTe. Silicon data used from http://dx.doi.org/10.1103/PhysRevX.8.041048. Dataset includes predicted force, potential energy and virial values.

ColabFit ID: NEP_PRB_2021__Fan__DS_yg0yzoiwoux8_0
Name: NEP_PRB_2021
Authors: Zheyong Fan
Elements: Pb, Si, Te
Number of Configurations: 7,426
Number of Elements: 3
Number of Atoms: 611,808

Links:
https://doi.org/10.5281/zenodo.5109599
https://doi.org/10.1103/PhysRevB.104.104309
NEP_qHPF_test
Dataset Downloads Coming Soon Description: The test set of a train and test set pair.The combined datasets comprise approximately 275 configurations of monolayer quasi-hexagonal-phase fullerene (qHPF) membrane used to train and test an NEP model.

ColabFit ID: NEP_qHPF_test__Ying__DS_mc8p14cpn2ea_0
Name: NEP_qHPF_test
Authors: Penghua Ying
Elements: C
Number of Configurations: 39
Number of Elements: 1
Number of Atoms: 4,680

Links:
https://doi.org/10.5281/zenodo.7018572
https://doi.org/10.1016/j.eml.2022.101929
NEP_qHPF_train
Dataset Downloads Coming Soon Description: The train set of a train and test set pair.The combined datasets comprise approximately 275 configurations of monolayer quasi-hexagonal-phase fullerene (qHPF) membrane used to train and test an NEP model.

ColabFit ID: NEP_qHPF_train__Ying__DS_umliq1qh50gy_0
Name: NEP_qHPF_train
Authors: Penghua Ying
Elements: C
Number of Configurations: 238
Number of Elements: 1
Number of Atoms: 28,560

Links:
https://doi.org/10.5281/zenodo.7018572
https://doi.org/10.1016/j.eml.2022.101929
NMD-18
Dataset Downloads Coming Soon Description: 3,000 Al-Ga-In sesquioxides with energies and band gaps. Relaxed and Vegard's Law geometries with formation energy and band gaps at DFT-PBE level of theory of (Alx-Gay-Inz)2O3 oxides, x+y+z=1. Contains all structures from the NOMAD 2018 Kaggle challenge training and leaderboard data. The formation energy and bandgap energy were computed by using the PBE exchange-correlation DFT functional with the all-electron electronic structure code FHI-aims with tight setting.

ColabFit ID: NMD-18__Sutton-Ghiringhelli-Yamamoto-Lysogorskiy-Blumenthal-Hammerschmidt-Golebiowski-Liu-Ziletti-Scheffler__DS_of1clsgf4nab_0
Name: NMD-18
Authors: Christopher Sutton, Luca M. Ghiringhelli, Takenori Yamamoto, Yury Lysogorskiy, Lars Blumenthal, Thomas Hammerschmidt, Jacek R. Golebiowski, Xiangyue Liu, Angelo Ziletti, Matthias Scheffler
Elements: Al, Ga, In, O
Number of Configurations: 3,000
Number of Elements: 4
Number of Atoms: 185,070

Links:
https://qmml.org/datasets.html
https://doi.org/10.1038/s41524-019-0239-3
NNIP_FeH_PRM_2021
Dataset Downloads Coming Soon Description: Approximately 20,000 configurations from a dataset of alpha-iron and hydrogen. Properties include forces and potential energy, calculated using VASP at the DFT level using the GGA-PBE functional.

ColabFit ID: NNIP_FeH_PRM_2021__Meng-Du-Shinzato-Mori-Yu-Matsubara-Ishikawa-Ogata__DS_nb8hcpibz1dt_0
Name: NNIP_FeH_PRM_2021
Authors: Fan-Shun Meng, Jun-Ping Du, Shuhei Shinzato, Hideki Mori, Peijun Yu, Kazuki Matsubara, Nobuyuki Ishikawa, Shigenobu Ogata
Elements: Fe, H
Number of Configurations: 20,920
Number of Elements: 2
Number of Atoms: 1,870,008

Links:
https://github.com/mengfsou/NNIP-FeH
https://doi.org/10.1103/PhysRevMaterials.5.113606
NNP-Ga2O3
Dataset Downloads Coming Soon Description: 9,200 configurations of beta-Ga2O3, including two configuration sets. One contains DFT data for 8400 configurations simulated between temperatures of 50K - 600K. The second contains configurations with 0K simulation temperature.

ColabFit ID: NNP-Ga2O3__Li-Liu-Rohskopf-Gordiz-Henry-Lee-Luo__DS_3qumwe2j8lib_0
Name: NNP-Ga2O3
Authors: Ruiyang Li, Zeyu Liu, Andrew Rohskopf, Kiarash Gordiz, Asegun Henry, Eungkyu Lee, Tengfei Luo
Elements: Ga, O
Number of Configurations: 9,200
Number of Elements: 2
Number of Atoms: 2,944,000

Links:
https://github.com/RuiyangLi6/NNP_Ga2O3
https://doi.org/10.1063/5.0025051
NVNMD_GeTe
Dataset Downloads Coming Soon Description: Approximately 5,000 configurations of GeTe used in training of a non-von Neumann multiplication-less DNN model.

ColabFit ID: NVNMD_GeTe__Mo-Li-Zhao-Zhang-Shi-Li-Liu__DS_0gyf3srv7xh7_0
Name: NVNMD_GeTe
Authors: Pinghui Mo, Chang Li, Dan Zhao, Yujia Zhang, Mengchao Shi, Junhua Li, Jie Liu
Elements: Ge, Te
Number of Configurations: 5,025
Number of Elements: 2
Number of Atoms: 321,600

Links:
https://github.com/LiuGroupHNU/nvnmd
https://doi.org/10.1038/s41524-022-00773-z
N_O_F_columns_non-bonded_vdW_potential_JCP2023
Dataset Downloads Coming Soon Description: This dataset contains structures of materials from the N (15th), O (16th) and F (16th) columns of the periodic table used for generating a 2-body non-bonded vdW potential.

ColabFit ID: N_O_F_columns_non-bonded_vdW_potential_JCP2023__Geng-Zybin-Naserifar-III__DS_p74559sdjy1q_0
Name: N_O_F_columns_non-bonded_vdW_potential_JCP2023
Authors: Peng Geng, Sergey Zybin, Saber Naserifar, William A. Goddard, III
Elements: As, At, Bi, O, P, Po, S, Sb, Se, Te
Number of Configurations: 262
Number of Elements: 10
Number of Atoms: 1,494

Links:
https://doi.org/10.1063/5.0174188
https://doi.org/10.1063/5.0174188
Nb_PRM2019
Dataset Downloads Coming Soon Description: This dataset was designed to enable machine-learning of Nb elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.

ColabFit ID: Nb_PRM2019__Byggmästar-Nordlund-Djurabekova__DS_tbpx3pawtlgt_0
Name: Nb_PRM2019
Authors: Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
Elements: Nb
Number of Configurations: 3,787
Number of Elements: 1
Number of Atoms: 45,641

Links:
https://gitlab.com/acclab/gap-data/-/tree/master/
https://doi.org/10.1103/PhysRevMaterials.4.093802
NequIP_NC_2022
Dataset Downloads Coming Soon Description: Approximately 57,000 configurations from the evaluation datasets for NequIP graph neural network model for interatomic potentials. Trajectories have been taken from LIPS, LIPO glass melt-quench simulation, and formate decomposition on Cu datasets.

ColabFit ID: NequIP_NC_2022__Batzner-Musaelian-Sun-Geiger-Mailoa-Kornbluth-Molinari-Smidt-Kozinsky__DS_4pbhjtu62o2d_0
Name: NequIP_NC_2022
Authors: Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, Boris Kozinsky
Elements: C, Cu, H, Li, O, P, S
Number of Configurations: 56,856
Number of Elements: 7
Number of Atoms: 7,631,075

Links:
https://doi.org/10.24435/materialscloud:s0-5n
https://doi.org/10.1038/s41467-022-29939-5
NiCoCr_NC2020
Dataset Downloads Coming Soon Description: The face-centered cubic medium-entropy alloy NiCoCr has received considerable attention for its good mechanical properties, uncertain stacking fault energy, etc, some of which have been attributed to chemical short-range order (SRO). Here, we examine the yield strength and misfit volumes of NiCoCr to determine whether SRO has measurably influenced mechanical properties. Polycrystalline strengths show no systematic trend with different processing conditions. Measured misfit volumes in NiCoCr are consistent with those in random binaries. Yield strength prediction of a random NiCoCr alloy matches well with experiments. Finally, we show that standard spin-polarized density functional theory (DFT) calculations of misfit volumes are not accurate for NiCoCr. This implies that DFT may be inaccurate for other subtle structural quantities such as atom-atom bond distance so that caution is required in drawing conclusions about NiCoCr based on DFT. These findings all lead to the conclusion that, under typical processing conditions, SRO in NiCoCr is either negligible or has no systematic measurable effect on strength.

ColabFit ID: NiCoCr_NC2020__Yin-Curtin__DS_5tegg4uvaixz_0
Name: NiCoCr_NC2020
Authors: Binglun Yin, William Curtin
Elements: Co, Cr, Ni
Number of Configurations: 428
Number of Elements: 3
Number of Atoms: 40,624

Links:
https://doi.org/10.24435/materialscloud:s4-g3
https://doi.org/10.1038/s41467-020-16083-1
OC20_IS2RES_val_id
Dataset Downloads Coming Soon Description: OC20_IS2RES_val_id is the in-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_IS2RES_val_id__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_paumiiad6ttu_0
Name: OC20_IS2RES_val_id
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 5,026,511
Number of Elements: 56
Number of Atoms: 406,702,097

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_IS2RES_val_ood_ads
Dataset Downloads Coming Soon Description: OC20_IS2RES_ood_ads is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen adsorbates. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_IS2RES_val_ood_ads__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_71806e6y8u2m_0
Name: OC20_IS2RES_val_ood_ads
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 4,883,196
Number of Elements: 56
Number of Atoms: 390,308,139

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_IS2RES_val_ood_both
Dataset Downloads Coming Soon Description: OC20_IS2RES_ood_ads is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen adsorbates. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_IS2RES_val_ood_both__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_u9jm7oxyusjc_0
Name: OC20_IS2RES_val_ood_both
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 3,665,193
Number of Elements: 55
Number of Atoms: 308,297,930

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_IS2RES_val_ood_cat
Dataset Downloads Coming Soon Description: OC20_IS2RES_val_ood_cat is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen catalyst composition. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_IS2RES_val_ood_cat__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_ava2xqam2xfi_0
Name: OC20_IS2RES_val_ood_cat
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 5,151,015
Number of Elements: 55
Number of Atoms: 411,767,380

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_train_200K
Dataset Downloads Coming Soon Description: OC20_S2EF_train_200K is the 200K training split of the OC20 Structure to Energy and Forces (S2EF) task.

ColabFit ID: OC20_S2EF_train_200K__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_zdy2xz6y88nl_0
Name: OC20_S2EF_train_200K
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 200,000
Number of Elements: 56
Number of Atoms: 14,631,937

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_train_20M
Dataset Downloads Coming Soon Description: OC20_S2EF_train_20M is the 20 million structure training subset of the OC20 Structure to Energy and Forces dataset. Features include potential energy, free energy and atomic forces. Data from the OC20 mappings file, including adsorbate id, materials project bulk id, miller index, shift and others.

ColabFit ID: OC20_S2EF_train_20M__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_otx1qc9f3pm4_0
Name: OC20_S2EF_train_20M
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 20,000,000
Number of Elements: 56
Number of Atoms: 1,465,265,878

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_train_2M
Dataset Downloads Coming Soon Description: OC20_S2EF_train_2M is the 2 million structure training subset of the OC20 Structure to Energy and Forces dataset. Features include potential energy, free energy and atomic forces. Data from the OC20 mappings file, including adsorbate id, materials project bulk id, miller index, shift and others.

ColabFit ID: OC20_S2EF_train_2M__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_7qi6dh0ig7sd_0
Name: OC20_S2EF_train_2M
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 2,000,000
Number of Elements: 56
Number of Atoms: 146,496,199

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_train_all
Dataset Downloads Coming Soon Description: OC20_S2EF_train_all is the ~63 million structure full training set of the OC20 Structure to Energy and Forces (S2EF) dataset. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_S2EF_train_all__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_jyuwhl30jklq_0
Name: OC20_S2EF_train_all
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 133,934,018
Number of Elements: 56
Number of Atoms: 9,810,895,377

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_val_id
Dataset Downloads Coming Soon Description: OC20_S2EF_val_id is the ~1-million structure in-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_S2EF_val_id__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_wv9zv6egp9vk_0
Name: OC20_S2EF_val_id
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 999,866
Number of Elements: 56
Number of Atoms: 73,147,343

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_val_ood_ads
Dataset Downloads Coming Soon Description: OC20_S2EF_val_ood_ads is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring unseen adsorbate. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_S2EF_val_ood_ads__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_cgdsc3gxoamu_0
Name: OC20_S2EF_val_ood_ads
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 999,838
Number of Elements: 56
Number of Atoms: 72,858,155

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_val_ood_both
Dataset Downloads Coming Soon Description: OC20_S2EF_val_ood_both is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring both unseen catalyst composition and unseen adsorbate. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_S2EF_val_ood_both__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_889euoe7akyy_0
Name: OC20_S2EF_val_ood_both
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 999,944
Number of Elements: 55
Number of Atoms: 84,604,635

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC20_S2EF_val_ood_cat
Dataset Downloads Coming Soon Description: OC20_S2EF_val_ood_cat is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring unseen catalyst composition. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.

ColabFit ID: OC20_S2EF_val_ood_cat__Chanussot-Das-Goyal-Lavril-Shuaibi-Riviere-Tran-Heras-Domingo-Ho-Hu-Palizhati-Sriram-Wood-Yoon-Parikh-Zitnick-Ulissi__DS_wmgdq06mzdys_0
Name: OC20_S2EF_val_ood_cat
Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
Elements: Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 999,809
Number of Elements: 55
Number of Atoms: 74,059,718

Links:
https://fair-chem.github.io/core/datasets/oc20.html
https://doi.org/10.1021/acscatal.0c04525
OC22-IS2RE-Train
Dataset Downloads Coming Soon Description: Training configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.

ColabFit ID: OC22-IS2RE-Train__Tran-Lan-Shuaibi-Wood-Goyal-Das-Heras-Domingo-Kolluru-Rizvi-Shoghi-Sriram-Therrien-Abed-Voznyy-Sargent-Ulissi-Zitnick__DS_3h39eqiv9urv_0
Name: OC22-IS2RE-Train
Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
Elements: Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 7,878,688
Number of Elements: 57
Number of Atoms: 634,845,874

Links:
https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#open-catalyst-2022-oc22
https://doi.org/10.1021/acscatal.2c05426
OC22-IS2RE-Validation-in-domain
Dataset Downloads Coming Soon Description: In-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.

ColabFit ID: OC22-IS2RE-Validation-in-domain__Tran-Lan-Shuaibi-Wood-Goyal-Das-Heras-Domingo-Kolluru-Rizvi-Shoghi-Sriram-Therrien-Abed-Voznyy-Sargent-Ulissi-Zitnick__DS_4eb78xs9suoo_0
Name: OC22-IS2RE-Validation-in-domain
Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
Elements: Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 442,843
Number of Elements: 57
Number of Atoms: 35,275,211

Links:
https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#open-catalyst-2022-oc22
https://doi.org/10.1021/acscatal.2c05426
OC22-IS2RE-Validation-out-of-domain
Dataset Downloads Coming Soon Description: Out-of-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.

ColabFit ID: OC22-IS2RE-Validation-out-of-domain__Tran-Lan-Shuaibi-Wood-Goyal-Das-Heras-Domingo-Kolluru-Rizvi-Shoghi-Sriram-Therrien-Abed-Voznyy-Sargent-Ulissi-Zitnick__DS_ikbqbyd3dw25_0
Name: OC22-IS2RE-Validation-out-of-domain
Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
Elements: Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Ti, Tl, V, W, Zn, Zr
Number of Configurations: 521,827
Number of Elements: 52
Number of Atoms: 42,219,955

Links:
https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#open-catalyst-2022-oc22
https://doi.org/10.1021/acscatal.2c05426
OC22-S2EF-Train
Dataset Downloads Coming Soon Description: Training configurations for the structure to total energy and forces task (S2EF) of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.

ColabFit ID: OC22-S2EF-Train__Tran-Lan-Shuaibi-Wood-Goyal-Das-Heras-Domingo-Kolluru-Rizvi-Shoghi-Sriram-Therrien-Abed-Voznyy-Sargent-Ulissi-Zitnick__DS_jgaid7espcoc_0
Name: OC22-S2EF-Train
Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
Elements: Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 8,389,365
Number of Elements: 57
Number of Atoms: 669,615,870

Links:
https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#open-catalyst-2022-oc22
https://doi.org/10.1021/acscatal.2c05426
OC22-S2EF-Validation-in-domain
Dataset Downloads Coming Soon Description: In-domain validation configurations for the structure to total energy and forces (S2EF) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.

ColabFit ID: OC22-S2EF-Validation-in-domain__Tran-Lan-Shuaibi-Wood-Goyal-Das-Heras-Domingo-Kolluru-Rizvi-Shoghi-Sriram-Therrien-Abed-Voznyy-Sargent-Ulissi-Zitnick__DS_f25kywgtdhks_0
Name: OC22-S2EF-Validation-in-domain
Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
Elements: Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Number of Configurations: 407,195
Number of Elements: 57
Number of Atoms: 31,919,751

Links:
https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#open-catalyst-2022-oc22
https://doi.org/10.1021/acscatal.2c05426
OC22-S2EF-Validation-out-of-domain
Dataset Downloads Coming Soon Description: Out-of-domain validation configurations for the structure to total energy and forces (S2EF) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.

ColabFit ID: OC22-S2EF-Validation-out-of-domain__Tran-Lan-Shuaibi-Wood-Goyal-Das-Heras-Domingo-Kolluru-Rizvi-Shoghi-Sriram-Therrien-Abed-Voznyy-Sargent-Ulissi-Zitnick__DS_y8u933mhadjf_0
Name: OC22-S2EF-Validation-out-of-domain
Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
Elements: Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Ti, Tl, V, W, Zn, Zr
Number of Configurations: 459,594
Number of Elements: 52
Number of Atoms: 36,999,141

Links:
https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#open-catalyst-2022-oc22
https://doi.org/10.1021/acscatal.2c05426
OrbNet_Denali
Dataset Downloads Coming Soon Description: All DFT single-point calculations for the OrbNet Denali training set were carried out in Entos Qcore version 0.8.17 at the ωB97X-D3/def2-TZVP level of theory using in-core density fitting with the neese=4 DFT integration grid.

ColabFit ID: OrbNet_Denali__Christensen-Sirumalla-Qiao-OConnor-Smith-Ding-Bygrave-Anandkumar-Welborn-Manby-III__DS_5obi9nxcgmof_0
Name: OrbNet_Denali
Authors: Anders S. Christensen, Sai Krishna Sirumalla, Zhuoran Qiao, Michael B. OConnor, Daniel G. A. Smith, Feizhi Ding, Peter J. Bygrave, Animashree Anandkumar, Matthew Welborn, Frederick R. Manby, Thomas F. Miller III
Elements: B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si
Number of Configurations: 2,338,215
Number of Elements: 17
Number of Atoms: 104,958,650

Links:
https://doi.org/10.6084/m9.figshare.14883867.v2
https://doi.org/10.1063/5.0061990
PWMLFF_feature_comparison_NPJ2023
Dataset Downloads Coming Soon Description: Partial dataset for "Accuracy evaluation of different machine learning force field features". The included data is limited to that hosted directly on the repository at the related GitHub link. From publication abstract: Predicting energies and forces using machine learning force field (MLFF) depends on accurate descriptions (features) of chemical environment. Despite the numerous features proposed, there is a lack of controlled comparison among them for their universality and accuracy. In this work, we compared several commonly used feature types for their ability to describe physical systems. These different feature types include cosine feature, Gaussian feature, moment tensor potential (MTP) feature, spectral neighbor analysis potential feature, simplified smooth deep potential with Chebyshev polynomials feature and Gaussian polynomials feature, and atomic cluster expansion feature. We evaluated the training root mean square error (RMSE) for the atomic group energy, total energy, and force using linear regression model regarding to the density functional theory results. We applied these MLFF models to an amorphous sulfur system and carbon systems, and the fitting results show that MTP feature can yield the smallest RMSE results compared with other feature types for either sulfur system or carbon system in the disordered atomic configurations. Moreover, as an extending test of other systems, the MTP feature combined with linear regression model can also reproduce similar quantities along the ab initio molecular dynamics trajectory as represented by Cu systems. Our results are helpful in selecting the proper features for the MLFF development.

ColabFit ID: PWMLFF_feature_comparison_NPJ2023__Han-Li-Liu-Li-Wang__DS_cgjdk1e2txjy_0
Name: PWMLFF_feature_comparison_NPJ2023
Authors: Ting Han, Jie Li, Liping Liu, Fengyu Li, Lin-Wang Wang
Elements: C, H, Mg, Ni, O, Si
Number of Configurations: 17,255
Number of Elements: 6
Number of Atoms: 918,240

Links:
https://github.com/LonxunQuantum/PWMLFF_library/tree/main
https://www.doi.org/10.1088/1367-2630/acf2bb
Paramagnetic_lanthanide_compounds
Dataset Downloads Coming Soon Description: This dataset is composed of fully-deuterated Gd(III) analogue d-[GdL] in a variety of solvent materials, including MeOH, D2O and d6-DMSO.

ColabFit ID: Paramagnetic_lanthanide_compounds__Alnami-Kragskow-Staab-Skelton-Chilton__DS_cm3cr5kgqomw_0
Name: Paramagnetic_lanthanide_compounds
Authors: Barak Alnami, Jon G. C. Kragskow, Jakob K. Staab, Jonathan M. Skelton, Nicholas F. Chilton
Elements: C, Gd, H, N, O, S
Number of Configurations: 41,748
Number of Elements: 6
Number of Atoms: 28,419,876

Links:
https://doi.org/10.1021/jacs.3c01342
https://doi.org/10.48420/22015322.v1
PtNi_alloy_NPJ2022
Dataset Downloads Coming Soon Description: DFT dataset consisting of 6828 resampled Pt-Ni alloys used for training an NNP. The energy and forces of each structure in the resampled database are calculated using DFT. All reference DFT calculations for the training set of 6828 Pt-Ni alloy structures have been performed using the Vienna Ab initio Simulation Package (VASP) with the spin-polarized revised Perdew-Burke-Ernzerhof (rPBE) exchange-correlation functional.

ColabFit ID: PtNi_alloy_NPJ2022__Han-Barcaro-Fortunelli-Lysgaard-Vegge-Hansen__DS_l3xhsrlgu8tq_0
Name: PtNi_alloy_NPJ2022
Authors: Shuang Han, Giovanni Barcaro, Alessandro Fortunelli, Steen Lysgaard, Tejs Vegge, Heine Anton Hansen
Elements: Ni, Pt
Number of Configurations: 6,828
Number of Elements: 2
Number of Atoms: 1,074,161

Links:
https://zenodo.org/record/5645281#.Y2CPkeTMJEa
https://doi.org/10.1038/s41524-022-00807-6
QM-22
Dataset Downloads Coming Soon Description: Includes CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.

ColabFit ID: QM-22__Bowman-Qu-Conte-Nandi-Houston-Yu__DS_2qq05tdbyovn_0
Name: QM-22
Authors: Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu
Elements: C, H, O
Number of Configurations: 6,762
Number of Elements: 3
Number of Atoms: 101,430

Links:
https://github.com/jmbowma/QM-22
https://doi.org/10.1063/5.0089200
QM7b_AlphaML
Dataset Downloads Coming Soon Description: Energy, computed with LR-CCSD, hybrid DFT (B3LYP & SCAN0) for 7211 molecules in QM7b and 52 molecules in AlphaML showcase database.

ColabFit ID: QM7b_AlphaML__Yang-Lao-Wilkins-Grisafi-Ceriotti-Jr__DS_3a5xucj4yqa8_0
Name: QM7b_AlphaML
Authors: Yang Yang, Ka Un Lao, David M. Wilkins, Andrea Grisafi, Michele Ceriotti, Robert A. DiStasio Jr
Elements: C, Cl, H, N, O, S
Number of Configurations: 29,033
Number of Elements: 6
Number of Atoms: 408,865

Links:
https://doi.org/10.24435/materialscloud:2019.0002/v3
https://doi.org/10.1038/s41597-019-0157-8
QM9x
Dataset Downloads Coming Soon Description: Dataset containing DFT calculations of energy and forces for all configurations in the QM9 dataset, recalculated with the ωB97X functional and 6-31G(d) basis set. Recalculating the energy and forces causes a slight shift of the potential energy surface, which results in forces acting on most configurations in the dataset. The data was generated by running Nudged Elastic Band (NEB) calculations with DFT on 10k reactions while saving intermediate calculations. QM9x is used as a benchmarking and comparison dataset for the dataset Transition1x.

ColabFit ID: QM9x__Schreiner-Bhowmik-Vegge-Busk-Winther__DS_d5ug96vla5xy_0
Name: QM9x
Authors: Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
Elements: C, F, H, N, O
Number of Configurations: 133,885
Number of Elements: 5
Number of Atoms: 2,407,753

Links:
https://doi.org/10.6084/m9.figshare.20449701.v2
https://doi.org/10.1038/s41597-022-01870-w
QM_hamiltonian_nature_2019
Dataset Downloads Coming Soon Description: ~100,000 configurations of water, ethanol, malondialdehyde and uracil gathered at the PBE/def2-SVP level of theory using ORCA.

ColabFit ID: QM_hamiltonian_nature_2019__Schütt-Gastegger-Tkatchenko-Müller-Maurer__DS_02cqe6a0bobu_0
Name: QM_hamiltonian_nature_2019
Authors: Kristof T. Schütt, Michael Gastegger, Alexandre Tkatchenko, Klaus-Robert Müller, Reinhard J. Maurer
Elements: C, H, N, O
Number of Configurations: 91,977
Number of Elements: 4
Number of Atoms: 887,799

Links:
http://quantum-machine.org/datasets/
https://doi.org/10.1038/s41467-019-12875-2
REANN_CO2_Ni100
Dataset Downloads Coming Soon Description: Approximately 9,850 configurations of CO2 with a movable Ni(100) surface.

ColabFit ID: REANN_CO2_Ni100__Zhang-Xia-Jiang__DS_s818rozdb6x6_0
Name: REANN_CO2_Ni100
Authors: Yaolong Zhang, Junfan Xia, Bin Jiang
Elements: C, Ni, O
Number of Configurations: 9,850
Number of Elements: 3
Number of Atoms: 384,150

Links:
https://github.com/zhangylch/REANN
https://doi.org/10.1021/acs.jpclett.9b00085
SAIT_semiconductors_ACS_2023_HfO_out-of-domain
Dataset Downloads Coming Soon Description: Out-of-domain configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_HfO_out-of-domain__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_dixgsms1jm98_0
Name: SAIT_semiconductors_ACS_2023_HfO_out-of-domain
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: Hf, O
Number of Configurations: 6,996
Number of Elements: 2
Number of Atoms: 671,616

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_HfO_raw
Dataset Downloads Coming Soon Description: Structures from the SAIT_semiconductors_ACS_2023_HfO dataset, separated into crystal, out-of-domain, and random (generated by randomly distributing 32 Hf and 64 O atoms within the unit cells of the HfO2 crystals) configuration sets. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_HfO_raw__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_ekrypue10aay_0
Name: SAIT_semiconductors_ACS_2023_HfO_raw
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: Hf, O
Number of Configurations: 192,000
Number of Elements: 2
Number of Atoms: 18,431,808

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_HfO_test
Dataset Downloads Coming Soon Description: Test configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_HfO_test__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_dxzkbbocjb0y_0
Name: SAIT_semiconductors_ACS_2023_HfO_test
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: Hf, O
Number of Configurations: 3,510
Number of Elements: 2
Number of Atoms: 336,960

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_HfO_train
Dataset Downloads Coming Soon Description: Training configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_HfO_train__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_ppk44zeithjj_0
Name: SAIT_semiconductors_ACS_2023_HfO_train
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: Hf, O
Number of Configurations: 27,960
Number of Elements: 2
Number of Atoms: 2,684,160

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_HfO_validation
Dataset Downloads Coming Soon Description: Validation configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_HfO_validation__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_89w2iq8fw7qi_0
Name: SAIT_semiconductors_ACS_2023_HfO_validation
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: Hf, O
Number of Configurations: 3,510
Number of Elements: 2
Number of Atoms: 336,960

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_SiN_out-of-domain
Dataset Downloads Coming Soon Description: Out-of-domain configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_SiN_out-of-domain__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_mo7yelvudt4k_0
Name: SAIT_semiconductors_ACS_2023_SiN_out-of-domain
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: N, Si
Number of Configurations: 1,235
Number of Elements: 2
Number of Atoms: 129,675

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_SiN_raw
Dataset Downloads Coming Soon Description: Structures from the SAIT_semiconductors_ACS_2023_SiN dataset, separated into N-only, Si-only, SiN, and out-of-domain melt, quench and relax configuration sets. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_SiN_raw__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_piuigd7monq9_0
Name: SAIT_semiconductors_ACS_2023_SiN_raw
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: N, Si
Number of Configurations: 88,163
Number of Elements: 2
Number of Atoms: 5,204,822

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_SiN_test
Dataset Downloads Coming Soon Description: Test configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_SiN_test__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_orfjpdomkekx_0
Name: SAIT_semiconductors_ACS_2023_SiN_test
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: N, Si
Number of Configurations: 2,866
Number of Elements: 2
Number of Atoms: 165,559

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_SiN_train
Dataset Downloads Coming Soon Description: Training configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_SiN_train__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_5yfdgzb5zhgm_0
Name: SAIT_semiconductors_ACS_2023_SiN_train
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: N, Si
Number of Configurations: 22,510
Number of Elements: 2
Number of Atoms: 1,284,467

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SAIT_semiconductors_ACS_2023_SiN_validation
Dataset Downloads Coming Soon Description: Validation configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.

ColabFit ID: SAIT_semiconductors_ACS_2023_SiN_validation__Kim-Na-Kim-Cho-Kang-Lee-Choi-Kim-Lee-Kim__DS_5piytwom0j25_0
Name: SAIT_semiconductors_ACS_2023_SiN_validation
Authors: Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
Elements: N, Si
Number of Configurations: 2,822
Number of Elements: 2
Number of Atoms: 159,951

Links:
https://github.com/SAITPublic/MLFF-Framework
https://openreview.net/forum?id=hr9Bd1A9Un
SIMPLE_NN_SiO2
Dataset Downloads Coming Soon Description: 10,000 configurations of SiO2 used as an example for the SIMPLE-NN machine learning model. Dataset includes three types of crystals: quartz, cristobalite and tridymite; amorphous; and liquid phase SiO2. Structures with distortion from compression, monoaxial strain and shear strain were also included in the training set.

ColabFit ID: SIMPLE_NN_SiO2__Lee-Yoo-Jeong-Han__DS_w2ngtxl0ep5d_0
Name: SIMPLE_NN_SiO2
Authors: Kyuhyun Lee, Dongsun Yoo, Wonseok Jeong, Seungwu Han
Elements: O, Si
Number of Configurations: 10,000
Number of Elements: 2
Number of Atoms: 600,000

Links:
https://doi.org/10.17632/pjv2yr7pvr.1
https://doi.org/10.1016/j.cpc.2019.04.014
SN2_JCTC_2019
Dataset Downloads Coming Soon Description: The SN2 dataset was generated as a partner benchmark dataset, along with the 'solvated protein fragments' dataset, for measuring the performance of machine learning models, in particular PhysNet, at describing chemical reactions, long-range interactions, and condensed phase systems. SN2 probes chemical reactions of methyl halides with halide anions, i.e. X- + CH3Y -> CH3X + Y-, and contains structures, for all possible combinations of X,Y = F, Cl, Br, I. The dataset also includes various structures for several smaller molecules that can be formed in fragmentation reactions, such as CH3X, HX, CHX or CH2X- as well as geometries for H2, CH2, CH3+ and XY interhalogen compounds. In total, the dataset provides reference energies, forces, and dipole moments for 452709 structurescalculated at the DSD-BLYP-D3(BJ)/def2-TZVP level of theory using ORCA 4.0.1.

ColabFit ID: SN2_JCTC_2019__Unke-Meuwly__DS_3tspv1150ejj_0
Name: SN2_JCTC_2019
Authors: Oliver T. Unke, Markus Meuwly
Elements: Br, C, Cl, F, H, I
Number of Configurations: 394,684
Number of Elements: 6
Number of Atoms: 2,194,246

Links:
https://doi.org/10.5281/zenodo.2605341
https://doi.org/10.1021/acs.jctc.9b00181
SPICE_2023
Dataset Downloads Coming Soon Description: SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. Subsets of the dataset include the following: dipeptides: these provide comprehensive sampling of the covalent interactions found in proteins; solvated amino acids: these provide sampling of protein-water and water-water interactions; PubChem molecules: These sample a very wide variety of drug-like small molecules; monomer and dimer structures from DES370K: these provide sampling of a wide variety of non-covalent interactions; ion pairs: these provide further sampling of Coulomb interactions over a range of distances.

ColabFit ID: SPICE_2023__Eastman-Behara-Dotson-Galvelis-Herr-Horton-Mao-Chodera-Pritchard-Wang-Fabritiis-Markland__DS_kg0dv12aiq97_0
Name: SPICE_2023
Authors: Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, Thomas E. Markland
Elements: Br, C, Ca, Cl, F, H, I, K, Li, N, Na, O, P, S
Number of Configurations: 116,504
Number of Elements: 14
Number of Atoms: 3,382,829

Links:
https://doi.org/10.5281/zenodo.8222043
https://doi.org/10.1038/s41597-022-01882-6
Si-H-GAP_reference
Dataset Downloads Coming Soon Description: A reference set of configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. These configurations were used to evaluate training on a GAP model.

ColabFit ID: Si-H-GAP_reference__Unruh-Meidanshahi-Goodnick-Csányi-Zimányi__DS_a3v4bu6mts6b_0
Name: Si-H-GAP_reference
Authors: Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi
Elements: H, Si
Number of Configurations: 114
Number of Elements: 2
Number of Atoms: 24,895

Links:
https://github.com/dgunruh/Si-H-GAP
https://doi.org/10.1103/PhysRevMaterials.6.065603
Si-H-GAP_training
Dataset Downloads Coming Soon Description: A set of training configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. Includes virial sigmas used for configurations used in the corresponding publication (virial-sigma-paper) as well as an alternate configuration defined by doubled virial sigma prefactors (from 0.025 to 0.05).

ColabFit ID: Si-H-GAP_training__Unruh-Meidanshahi-Goodnick-Csányi-Zimányi__DS_8arvjtldu4fw_0
Name: Si-H-GAP_training
Authors: Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi
Elements: H, Si
Number of Configurations: 392
Number of Elements: 2
Number of Atoms: 65,909

Links:
https://github.com/dgunruh/Si-H-GAP
https://doi.org/10.1103/PhysRevMaterials.6.065603
Si-H-GAP_validation
Dataset Downloads Coming Soon Description: A set of validation configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. These configurations served to augment the reference set as a final benchmark for NEP model performance.

ColabFit ID: Si-H-GAP_validation__Unruh-Meidanshahi-Goodnick-Csányi-Zimányi__DS_vywqeohskg2j_0
Name: Si-H-GAP_validation
Authors: Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi
Elements: H, Si
Number of Configurations: 150
Number of Elements: 2
Number of Atoms: 23,000

Links:
https://github.com/dgunruh/Si-H-GAP
https://doi.org/10.1103/PhysRevMaterials.6.065603
Si_Al_Ti_Seko_PRB_2019_test
Dataset Downloads Coming Soon Description: Training sets from Si_Al_Ti_Seko_PRB_2019. This dataset is compiled of 10,000 selected structures from the ICSD, divided into training and test sets. The dataset was generated for the purpose of training a MLIP with introduced high-order linearly independent rotational invariants up to the sixth order based on spherical harmonics. DFT calculations were carried out with VASP using the PBE cross-correlation functional and an energy cutoff of 400 eV.

ColabFit ID: Si_Al_Ti_Seko_PRB_2019_test__Seko-Togo-Tanaka__DS_dqascvm2baak_0
Name: Si_Al_Ti_Seko_PRB_2019_test
Authors: Atsuto Seko, Atsushi Togo, Isao Tanaka
Elements: Al, Si, Ti
Number of Configurations: 3,989
Number of Elements: 3
Number of Atoms: 197,628

Links:
other
https://doi.org/10.1103/PhysRevB.99.214108
Si_Al_Ti_Seko_PRB_2019_train
Dataset Downloads Coming Soon Description: Test sets from Si_Al_Ti_Seko_PRB_2019. This dataset is compiled of 10,000 selected structures from the ICSD, divided into training and test sets. The dataset was generated for the purpose of training a MLIP with introduced high-order linearly independent rotational invariants up to the sixth order based on spherical harmonics. DFT calculations were carried out with VASP using the PBE cross-correlation functional and an energy cutoff of 400 eV.

ColabFit ID: Si_Al_Ti_Seko_PRB_2019_train__Seko-Togo-Tanaka__DS_swqa99vqo249_0
Name: Si_Al_Ti_Seko_PRB_2019_train
Authors: Atsuto Seko, Atsushi Togo, Isao Tanaka
Elements: Al, Si, Ti
Number of Configurations: 36,155
Number of Elements: 3
Number of Atoms: 1,774,664

Links:
other
https://doi.org/10.1103/PhysRevB.99.214108
Si_JCP_2017
Dataset Downloads Coming Soon Description: A dataset of 64-atom silicon configurations in four phases: cubic-diamond, (beta)-tin, R8, and liquid. MD simulations are run at 300, 600 and 900 K for solid phases; up to 2500 K for the L phase. All relaxations performed at zero pressure. Additional configurations prepared by random distortion of crystal structures. VASP was used with a PAW pseudopotential and PBE exchange correlation. k-point mesh was optimized for energy convergence of 0.5 meV/atom and stress convergence of 0.1 kbar. The plane wave energy cutoff was set to 300 eV. To reduce the correlation between data points MD, data were thinned by using one of every 100 consecutive structures from the MD simulations at 300 K and one of every 20 structures from higher temperature MD simulations.

ColabFit ID: Si_JCP_2017__Cubuk-Malone-Onat-Waterland-Kaxiras__DS_paehju6qhaym_0
Name: Si_JCP_2017
Authors: Ekin D. Cubuk, Brad D. Malone, Berk Onat, Amos Waterland, Efthimios Kaxiras
Elements: Si
Number of Configurations: 1,117
Number of Elements: 1
Number of Atoms: 71,424

Links:
https://doi.org/10.1063/1.4990503
https://doi.org/10.1063/1.4990503
Si_PRX_GAP
Dataset Downloads Coming Soon Description: The original DFT training data for the general-purpose silicon interatomic potential described in the associated publication. The kinds of configuration that we include are chosen using intuition and past experience to guide what needs to be included to obtain good coverage pertaining to a range of properties.

ColabFit ID: Si_PRX_GAP__Bartók-Kermode-Bernstein-Csányi__DS_u9wd92plbetq_0
Name: Si_PRX_GAP
Authors: Albert P. Bartók, James Kermode, Noam Bernstein, Gábor Csányi
Elements: Si
Number of Configurations: 2,472
Number of Elements: 1
Number of Atoms: 171,164

Links:
https://doi.org/10.17863/CAM.65004
https://doi.org/10.1103/PhysRevX.8.041048
Silica_NPJCM_2022
Dataset Downloads Coming Soon Description: This dataset was created for the purpose of training an MLIP for silica (SiO2). For initial DFT computations, GPAW (in combination with ASE) was used with LDA, PBE and PBEsol functionals; and VASP with the SCAN functional. All calculations used the projector augmented-wave method. After comparison, it was found that SCAN performed best, and all values were recalculated using SCAN. An energy cut-off of 900 eV and a k-spacing of 0.23 Å-1 were used.

ColabFit ID: Silica_NPJCM_2022__Erhard-Rohrer-Albe-Deringer__DS_14m394gnh3ae_0
Name: Silica_NPJCM_2022
Authors: Linus C. Erhard, Jochen Rohrer, Karsten Albe, Volker L. Deringer
Elements: O, Si
Number of Configurations: 3,074
Number of Elements: 2
Number of Atoms: 268,118

Links:
https://doi.org/10.5281/zenodo.6353683
https://doi.org/10.1038/s41524-022-00768-w
Sn-SCAN_PRM_2023
Dataset Downloads Coming Soon Description: Approximately 6,500 configurations of Sn, including Sn8, Sn16 and Sn32, used in developing a deep potential that predicts the phase diagram of Sn.

ColabFit ID: Sn-SCAN_PRM_2023__Chen-Yuan-Liu-Geng-Zhang-Wang-Chen__DS_h1i3plesx7bc_0
Name: Sn-SCAN_PRM_2023
Authors: Tao Chen, Fengbo Yuan, Jianchuan Liu, Huayun Geng, Linfeng Zhang, Han Wang, Mohan Chen
Elements: Sn
Number of Configurations: 6,721
Number of Elements: 1
Number of Atoms: 113,584

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=Sn-SCAN
https://doi.org/10.1103/PhysRevMaterials.7.053603
TSFF_PLOS_2022
Dataset Downloads Coming Soon Description: One configuration of an enzyme: training data for a quantum-guided molecular mechanics model.

ColabFit ID: TSFF_PLOS_2022__Quinn-Patel-Koh-Haines-Norrby-Helquist-Wiest__DS_a0bxs66goqvv_0
Name: TSFF_PLOS_2022
Authors: Taylor R. Quinn, Himani N. Patel, Kevin H. Koh, Brandon E. Haines, Per-Ola Norrby, Paul Helquist, Olaf Wiest
Elements: C, H, N, O, S
Number of Configurations: 1
Number of Elements: 5
Number of Atoms: 117

Links:
https://doi.org/10.1371/journal.pone.0264960.s001
https://doi.org/10.1371/journal.pone.0264960
Ta_Linear_JCP2014
Dataset Downloads Coming Soon Description: This data set was originally used to generate a linear SNAP potential for solid and liquid tantalum as published in Thompson, A.P. et. al, J. Comp. Phys. 285 (2015) 316-330.

ColabFit ID: Ta_Linear_JCP2014__Thompson-Swiler-Trott-Foiles-Tucker__DS_rgtu2lkgv5rq_0
Name: Ta_Linear_JCP2014
Authors: Aidan P. Thompson, Laura P. Swiler, Christian R. Trott, Stephen M. Foiles, Garritt J. Tucker
Elements: Ta
Number of Configurations: 363
Number of Elements: 1
Number of Atoms: 4,224

Links:
https://github.com/FitSNAP/FitSNAP/tree/master/examples/Ta_Linear_JCP2014
https://doi.org/10.1016/j.jcp.2014.12.018
Ta_PINN_2021
Dataset Downloads Coming Soon Description: A dataset consisting of the energies of supercells containing from 1 to 250 atoms. The supercells represent energy-volume relations for 8 crystal structures of Ta, 5 uniform deformation paths between pairs of structures, vacancies, interstitials, surfaces with low-index orientations, 4 symmetrical tilt grain boundaries, γ-surfaces on the (110) and (211) fault planes, a [111] screw dislocation, liquid Ta, and several isolated clusters containing from 2 to 51 atoms. Some of the supercells contain static atomic configurations. However, most are snapshots of ab initio MD simulations at different densities, and temperatures ranging from 293 K to 3300 K. The BCC structure was sampled in the greatest detail, including a wide range of isotropic and uniaxial deformations.

ColabFit ID: Ta_PINN_2021__Lin-Pun-Mishin__DS_r6c6gt2s98xm_0
Name: Ta_PINN_2021
Authors: Yi-Shen Lin, Ganga P. Purja Pun, Yuri Mishin
Elements: Ta
Number of Configurations: 3,196
Number of Elements: 1
Number of Atoms: 136,037

Links:
https://doi.org/10.1016/j.commatsci.2021.111180
https://doi.org/10.1016/j.commatsci.2021.111180
Ta_PRM2019
Dataset Downloads Coming Soon Description: This dataset was designed to enable machine-learning of Ta elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.

ColabFit ID: Ta_PRM2019__Byggmästar-Nordlund-Djurabekova__DS_40zw467dnc6d_0
Name: Ta_PRM2019
Authors: Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
Elements: Ta
Number of Configurations: 3,775
Number of Elements: 1
Number of Atoms: 45,439

Links:
https://gitlab.com/acclab/gap-data/-/tree/master
https://doi.org/10.1103/PhysRevMaterials.4.093802
TdS-PdV_Atari5200
Dataset Downloads Coming Soon Description: Approximately 45,000 configurations of metal oxides of Mg, Ag, Pt, Cu and Zn, with initial training structures taken from the Materials Project database.

ColabFit ID: TdS-PdV_Atari5200__Wisesa-Andolina-Saidi__DS_yk3t004l8dpd_0
Name: TdS-PdV_Atari5200
Authors: Pandu Wisesa, Christopher M. Andolina, Wissam A. Saidi
Elements: Ag, Cu, Mg, O, Pt, Zn
Number of Configurations: 44,404
Number of Elements: 6
Number of Atoms: 1,987,604

Links:
https://doi.org/10.5281/zenodo.7278341
https://doi.org/10.1021/acs.jpclett.2c03445
TiMoS_alloys_CMS2021
Dataset Downloads Coming Soon Description: Training set (DFT output) for CE models and MC simulation output for the manuscript 'Phase behaviour of (Ti:Mo)S2binary alloys arising from electron-lattice coupling'. The DFT calculations are performed using VASP 5.4.3, compiled with intel MPI and Intel MKL support.

ColabFit ID: TiMoS_alloys_CMS2021__Silva-Polcar-Kramer__DS_jn819esw58ah_0
Name: TiMoS_alloys_CMS2021
Authors: Andrea Silva, Tomas Polcar, Denis Kramer
Elements: Mo, S, Ti
Number of Configurations: 259
Number of Elements: 3
Number of Atoms: 3,996

Links:
https://eprints.soton.ac.uk/443461/
https://doi.org/10.1016/j.commatsci.2020.110044
TiO2_CMS2016
Dataset Downloads Coming Soon Description: TiO2 dataset that was designed to build atom neural network potentials (ANN) by Artrith et al. using the AENET package. This dataset includes various crystalline phases of TiO2 and MD data that are extracted from ab inito calculations. The dataset includes 7815 structures with 165,229 atomic environments in the stochiometric ratio of 66% O to 34% Ti.

ColabFit ID: TiO2_CMS2016__Artrith-Urban__DS_kvjft3au55qb_0
Name: TiO2_CMS2016
Authors: Nongnuch Artrith, Alexander Urban
Elements: O, Ti
Number of Configurations: 7,812
Number of Elements: 2
Number of Atoms: 165,114

Links:
https://github.com/DescriptorZoo/sensitivity-dimensionality-results/tree/master/datasets/TiO2
https://doi.org/10.1016/j.commatsci.2015.11.047
TiZrHfTa_APS2021
Dataset Downloads Coming Soon Description: A dataset used to train machine-learning interatomic potentials (moment tensor potentials) for multicomponent alloys to ab initio data in order to investigate the disordered body-centered cubic (bcc) TiZrHfTax system with varying Ta concentration.

ColabFit ID: TiZrHfTa_APS2021__Gubaev-Ikeda-Tasnádi-Neugebauer-Shapeev-Grabowski-Körmann__DS_ngso7es93qnj_0
Name: TiZrHfTa_APS2021
Authors: Konstantin Gubaev, Yuji Ikeda, Ferenc Tasnádi, Jörg Neugebauer, Alexander V. Shapeev, Blazej Grabowski, Fritz Körmann
Elements: Hf, Ta, Ti, Zr
Number of Configurations: 3,623
Number of Elements: 4
Number of Atoms: 223,984

Links:
other
https://doi.org/10.1103/PhysRevMaterials.5.073801
Ti_NPJCM_2021
Dataset Downloads Coming Soon Description: Approximately 7,400 configurations of titanium used for training a deep potential using the DeePMD-kit molecular dynamics package and DP-GEN training scheme.

ColabFit ID: Ti_NPJCM_2021__Wen-Wang-Zhu-Zhang-Wang-Srolovitz-Wu__DS_aaocsvp2x40m_0
Name: Ti_NPJCM_2021
Authors: Tongqi Wen, Rui Wang, Lingyu Zhu, Linfeng Zhang, Han Wang, David J. Srolovitz, Zhaoxuan Wu
Elements: Ti
Number of Configurations: 7,378
Number of Elements: 1
Number of Atoms: 143,856

Links:
https://www.aissquare.com/datasets/detail?pageType=datasets&name=Ti
https://doi.org/10.1038/s41524-021-00661-y
Transition1x-test
Dataset Downloads Coming Soon Description: The test split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x.

ColabFit ID: Transition1x-test__Schreiner-Bhowmik-Vegge-Busk-Winther__DS_zzfaosyakwom_0
Name: Transition1x-test
Authors: Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
Elements: C, H, N, O
Number of Configurations: 190,277
Number of Elements: 4
Number of Atoms: 2,106,770

Links:
https://doi.org/10.6084/m9.figshare.19614657.v4
https://doi.org/10.1038/s41597-022-01870-w
Transition1x-validation
Dataset Downloads Coming Soon Description: The validation split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x.

ColabFit ID: Transition1x-validation__Schreiner-Bhowmik-Vegge-Busk-Winther__DS_ktku4cml3al7_0
Name: Transition1x-validation
Authors: Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
Elements: C, H, N, O
Number of Configurations: 264,996
Number of Elements: 4
Number of Atoms: 3,743,476

Links:
https://doi.org/10.6084/m9.figshare.19614657.v4
https://doi.org/10.1038/s41597-022-01870-w
Transition1x_train
Dataset Downloads Coming Soon Description: The training split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x.

ColabFit ID: Transition1x_train__Schreiner-Bhowmik-Vegge-Busk-Winther__DS_sn0csplf32qj_0
Name: Transition1x_train
Authors: Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
Elements: C, H, N, O
Number of Configurations: 62,990
Number of Elements: 4
Number of Atoms: 536,010

Links:
https://doi.org/10.6084/m9.figshare.19614657.v4
https://doi.org/10.1038/s41597-022-01870-w
UNEP_v1_2023_test
Dataset Downloads Coming Soon Description: The test set for UNEP-v1 (version 1 of Unified NeuroEvolution Potential), a model implemented in GPUMD.

ColabFit ID: UNEP-v1_2023__Song-Zhao-Liu-Wang-Lindgren-Wang-Chen-Xu-Liang-Ying-Xu-Zhao-Shi-Wang-Lyu-Zeng-Liang-Dong-Sun-Chen-Zhang-Guo-Qian-Sun-Erhart-Ala-Nissila-Su-Fan__DS_jyklsju4h580_0
Name: UNEP_v1_2023_test
Authors: Keke Song, Rui Zhao, Jiahui Liu, Yanzhou Wang, Eric Lindgren, Yong Wang, Shunda Chen, Ke Xu, Ting Liang, Penghua Ying, Nan Xu, Zhiqiang Zhao, Jiuyang Shi, Junjie Wang, Shuang Lyu, Zezhu Zeng, Shirong Liang, Haikuan Dong, Ligang Sun, Yue Chen, Zhuhua Zhang, Wanlin Guo, Ping Qian, Jian Sun, Paul Erhart, Tapio Ala-Nissila, Yanjing Su, Zheyong Fan
Elements: Ag, Al, Au, Cr, Cu, Mg, Mo, Ni, Pb, Pd, Pt, Ta, Ti, V, W, Zr
Number of Configurations: 4,411
Number of Elements: 16
Number of Atoms: 318,910

Links:
https://zenodo.org/doi/10.5281/zenodo.10081676
https://doi.org/10.48550/arXiv.2311.04732
UNEP_v1_2023_train
Dataset Downloads Coming Soon Description: The training set for UNEP-v1 (version 1 of Unified NeuroEvolution Potential), a model implemented in GPUMD.

ColabFit ID: UNEP-v1_2023__Song-Zhao-Liu-Wang-Lindgren-Wang-Chen-Xu-Liang-Ying-Xu-Zhao-Shi-Wang-Lyu-Zeng-Liang-Dong-Sun-Chen-Zhang-Guo-Qian-Sun-Erhart-Ala-Nissila-Su-Fan__DS_14h4rvviya0k_0
Name: UNEP_v1_2023_train
Authors: Keke Song, Rui Zhao, Jiahui Liu, Yanzhou Wang, Eric Lindgren, Yong Wang, Shunda Chen, Ke Xu, Ting Liang, Penghua Ying, Nan Xu, Zhiqiang Zhao, Jiuyang Shi, Junjie Wang, Shuang Lyu, Zezhu Zeng, Shirong Liang, Haikuan Dong, Ligang Sun, Yue Chen, Zhuhua Zhang, Wanlin Guo, Ping Qian, Jian Sun, Paul Erhart, Tapio Ala-Nissila, Yanjing Su, Zheyong Fan
Elements: Ag, Al, Au, Cr, Cu, Mg, Mo, Ni, Pb, Pd, Pt, Ta, Ti, V, W, Zr
Number of Configurations: 104,799
Number of Elements: 16
Number of Atoms: 6,840,534

Links:
https://zenodo.org/doi/10.5281/zenodo.10081676
https://doi.org/10.48550/arXiv.2311.04732
V_PRM2019
Dataset Downloads Coming Soon Description: This dataset was designed to enable machine-learning of V elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.

ColabFit ID: V_PRM2019__Byggmästar-Nordlund-Djurabekova__DS_ouwlietscprn_0
Name: V_PRM2019
Authors: Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
Elements: V
Number of Configurations: 3,802
Number of Elements: 1
Number of Atoms: 46,466

Links:
https://gitlab.com/acclab/gap-data/-/tree/master
https://doi.org/10.1103/PhysRevMaterials.4.093802
W-14
Dataset Downloads Coming Soon Description: 158,000 diverse atomic environments of elemental tungsten.Includes DFT-PBE energies, forces and stresses for tungsten; periodic unit cells in the range of 1-135 atoms, including bcc primitive cell, 128-atom bcc cell, vacancies, low index surfaces, gamma-surfaces, and dislocation cores.

ColabFit ID: W-14__Szlachta-Bartók-Csányi__DS_kk4a8u1eo42m_0
Name: W-14
Authors: Wojciech J. Szlachta, Albert P. Bartók, Gábor Csányi
Elements: W
Number of Configurations: 9,693
Number of Elements: 1
Number of Atoms: 158,515

Links:
https://qmml.org/datasets.html
https://doi.org/10.1103/PhysRevB.90.104108
WBe_PRB2019
Dataset Downloads Coming Soon Description: This data set was originally used to generate a multi-component linear SNAP potential for tungsten and beryllium as published in Wood, M. A., et. al. Phys. Rev. B 99 (2019) 184305. This data set was developed for the purpose of studying plasma material interactions in fusion reactors.

ColabFit ID: WBe_PRB2019__Wood-Cusentino-Wirth-Thompson__DS_yq2whjjndyq5_0
Name: WBe_PRB2019
Authors: Mitchell A. Wood, Mary Alice Cusentino, Brian D. Wirth, Aidan P. Thompson
Elements: Be, W
Number of Configurations: 25,120
Number of Elements: 2
Number of Atoms: 525,915

Links:
https://github.com/FitSNAP/FitSNAP/tree/master/examples/WBe_PRB2019
https://doi.org/10.1103/PhysRevB.99.184305
WS22_acrolein
Dataset Downloads Coming Soon Description: Configurations of acrolein from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_acrolein__Jr-Zhang-Dral-Barbatti__DS_wqezpu0pw9io_0
Name: WS22_acrolein
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, O
Number of Configurations: 120,000
Number of Elements: 3
Number of Atoms: 960,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_alanine
Dataset Downloads Coming Soon Description: Configurations of alanine from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_alanine__Jr-Zhang-Dral-Barbatti__DS_vxtsip8qgm7t_0
Name: WS22_alanine
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,000
Number of Elements: 4
Number of Atoms: 1,560,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_dmabn
Dataset Downloads Coming Soon Description: Configurations of dmabn from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_dmabn__Jr-Zhang-Dral-Barbatti__DS_srva1ami36vs_0
Name: WS22_dmabn
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N
Number of Configurations: 120,000
Number of Elements: 3
Number of Atoms: 2,520,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_nitrophenol
Dataset Downloads Coming Soon Description: Configurations of nitrophenol from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_nitrophenol__Jr-Zhang-Dral-Barbatti__DS_s3qsg4c3hyk6_0
Name: WS22_nitrophenol
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,000
Number of Elements: 4
Number of Atoms: 1,800,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_o-hbdi
Dataset Downloads Coming Soon Description: Configurations of o-hbdi from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_o-hbdi__Jr-Zhang-Dral-Barbatti__DS_ukw33o4tvaiy_0
Name: WS22_o-hbdi
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,000
Number of Elements: 4
Number of Atoms: 2,640,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_sma
Dataset Downloads Coming Soon Description: Configurations of sma from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_sma__Jr-Zhang-Dral-Barbatti__DS_jz8yny1m7xer_0
Name: WS22_sma
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,040
Number of Elements: 4
Number of Atoms: 2,280,760

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_thymine
Dataset Downloads Coming Soon Description: Configurations of o-hbdi from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_thymine__Jr-Zhang-Dral-Barbatti__DS_ngd7bajt6nvx_0
Name: WS22_thymine
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,000
Number of Elements: 4
Number of Atoms: 1,800,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_toluene
Dataset Downloads Coming Soon Description: Configurations of toluene from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_toluene__Jr-Zhang-Dral-Barbatti__DS_ztubsom32o4f_0
Name: WS22_toluene
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H
Number of Configurations: 100,000
Number of Elements: 2
Number of Atoms: 1,500,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_urea
Dataset Downloads Coming Soon Description: Configurations of urea from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_urea__Jr-Zhang-Dral-Barbatti__DS_0upwmrh3apql_0
Name: WS22_urea
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,000
Number of Elements: 4
Number of Atoms: 960,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
WS22_urocanic
Dataset Downloads Coming Soon Description: Configurations of urocanic from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.

ColabFit ID: WS22_urocanic__Jr-Zhang-Dral-Barbatti__DS_w3sximoit8bj_0
Name: WS22_urocanic
Authors: Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
Elements: C, H, N, O
Number of Configurations: 120,000
Number of Elements: 4
Number of Atoms: 1,920,000

Links:
https://doi.org/10.5281/zenodo.7032333
https://doi.org/10.1038/s41597-023-01998-3
W_LML-retrain_bulk_MD_test
Dataset Downloads Coming Soon Description: Test set from W_LML-retrain dataset, containing bulk tungsten calculations. The W_LML-retrain dataset contains DFT calculations used in testing a linear-in-descriptor machine learning potential that accounts for dislocation-defect interactions in tungsten. Density functional simulations were performed using VASP. The PBE generalised gradient approximation was used to describe effects of electron exchange and correlation together with a projector augmented wave (PAW) basis set with a cut-off energy of 550 eV. Occupancies were smeared with a Methfessel-Paxton scheme of order one with a 0.1 eV smearing width. The Brillouin zone was sampled with a Monkhorst-Pack k-point grid for the 2D cluster simulations periodic along the dislocation line and a single k-point was used for the calculations with 3D spherical QM regions. The values of these parameters were chosen after a series of convergence tests on forces with a tolerance of a few meV/Å.

ColabFit ID: W_LML-retrain_bulk_MD_test__Onat-Ortner-Kermode__DS_0od3fns1ap8a_0
Name: W_LML-retrain_bulk_MD_test
Authors: Berk Onat, Christoph Ortner, James R. Kermode
Elements: W
Number of Configurations: 8
Number of Elements: 1
Number of Atoms: 1,996

Links:
https://github.com/marseille-matmol/LML-retrain
https://doi.org/10.1016/j.actamat.2023.118734
W_PRB2019
Dataset Downloads Coming Soon Description: This dataset was originally designed to fit a GAP potential with a specific focus on properties relevant for simulations of radiation-induced collision cascades and the damage they produce, including a realistic repulsive potential for short-range many-body cascade dynamics and a good description of the liquid phase.

ColabFit ID: W_PRB2019__Byggmästar-Hamedani-Nordlund-Djurabekova__DS_gh9s2sopu064_0
Name: W_PRB2019
Authors: Jesper Byggmästar, Ali Hamedani, Kai Nordlund, Flyura Djurabekova
Elements: W
Number of Configurations: 3,528
Number of Elements: 1
Number of Atoms: 42,068

Links:
https://gitlab.com/acclab/gap-data/-/tree/master/W/2019-05-24
https://doi.org/10.1103/PhysRevB.100.144105
Yttrium-catalyzed_benzylic_C-H_alkylations_of_alkylpyridines_with_olefins
Dataset Downloads Coming Soon Description: This data was assembled to investigate rare-earth-catalyzed benzylic C(sp3)-H addition of pyridines to olefins. All calculations were performed with the Gaussian 09 software package. The B3PW91 functional was used for geometric optimization without any symmetric constraints. Each optimized structure was subsequently analyzed by harmonic vibrational frequencies at the same level of theory for characterization of a minimum (NImag = 0) or a transition state (NImag = 1) to obtain the thermodynamic data. The 6-31G(d) basis set was used for C, H, and N atoms, and Stuttgart/Dresden relativistic effective core potentials (RECPs) as well as the associated valence basis sets were used for the Y atom. To obtain more accurate energies, single-point energy calculations were performed with a larger basis set. In such single-point calculations, the M06-L functional, which often shows good performance in the treatment of transition-metal systems, was used together with the CPCM solvation model for consideration of the toluene solvation effect. The same basis set together with associated pseudopotentials as in geometry optimization was used for the Y atom, and the 6-311+G(d,p) basis set was used for the remaining atoms.

ColabFit ID: Yttrium-catalyzed_benzylic_C-H_alkylations_of_alkylpyridines_with_olefins__Zhou-Luo-Kang-Hou-Luo__DS_xrfpxtaioi75_0
Name: Yttrium-catalyzed_benzylic_C-H_alkylations_of_alkylpyridines_with_olefins
Authors: Guangli Zhou, Gen Luo, Xiaohui Kang, Zhaomin Hou, Yi Luo
Elements: C, H, N, Y
Number of Configurations: 68
Number of Elements: 4
Number of Atoms: 4,110

Links:
https://doi.org/10.1021/acs.organomet.8b00397.s002
https://doi.org/10.1021/acs.organomet.8b00397
ZIF-4_Amorphous_Zeolitic_Imidazolate_Frameworks_2023
Dataset Downloads Coming Soon Description: This dataset contains four trajectories of amorphous zeolitic imidazolate frameworks (ZIF-4), liquids calculated at four different volumes and at temperatures of 1500K and 1750K; and three trajectories of the ZIF-4 crystal: one at 300K and two at 1500K. Data was generated at the DFT-PBE-D3 level of theory.

ColabFit ID: ZIF-4_Amorphous_Zeolitic_Imidazolate_Frameworks_2023__Castel-Andre-Edwards-Evans-Coudert__DS_sh7jt3ptmde4_0
Name: ZIF-4_Amorphous_Zeolitic_Imidazolate_Frameworks_2023
Authors: Nicolas Castel, Dune Andre, Connor Edwards, Jack D. Evans, Francois-Xavier Coudert
Elements: C, H, N, Zn
Number of Configurations: 1,189,836
Number of Elements: 4
Number of Atoms: 323,635,392

Links:
https://doi.org/10.5281/zenodo.10015594
https://doi.org/10.26434/chemrxiv-2023-8003d
Zeo-1_SD_2022
Dataset Downloads Coming Soon Description: 130,000 configurations of zeolite from the Database of Zeolite Structures. Calculations performed using Amsterdam Modeling Suite software.

ColabFit ID: Zeo-1_SD_2022__Komissarov-Verstraelen__DS_9in0wrvt6qg2_0
Name: Zeo-1_SD_2022
Authors: Leonid Komissarov, Toon Verstraelen
Elements: Al, Ba, Be, C, Ca, Cs, F, Ge, H, K, Li, N, Na, O, Si
Number of Configurations: 12,930
Number of Elements: 15
Number of Atoms: 1,841,742

Links:
https://doi.org/10.1038/s41597-022-01160-5
https://doi.org/10.24435/materialscloud:cv-zd
Zn_MTP_CMS2023
Dataset Downloads Coming Soon Description: A training dataset of diverse atomic configurations of Zn, varying in aggregation states, crystal structures, defect types, and sizes. The aim was to derive a potential capable of accurately describing a broad spectrum of local atomic configurations in Zn.

ColabFit ID: Zn_MTP_CMS2023__Mei-Cheng-Chen-Wang-Li-Kong__DS_58y020ce6b6j_0
Name: Zn_MTP_CMS2023
Authors: Haojie Mei, Luyao Cheng, Liang Chen, Feifei Wang, Jinfu Li, Lingti Kong
Elements: Zn
Number of Configurations: 13,552
Number of Elements: 1
Number of Atoms: 278,996

Links:
https://github.com/meihaojie/Zn_system/tree/main
https://doi.org/10.1016/j.commatsci.2023.112723
Zr_Sn_JNM_2024
Dataset Downloads Coming Soon Description: This dataset contains data from density functional theory calculations of various atomic configurations of pure Zr, pure Sn, and Zr-Sn alloys with different structures, defects, and compositions. Energies, forces, and stresses are calculated at the DFT level of theory. Includes 23,956 total configurations.

ColabFit ID: Zr_Sn_JNM_2024__Mei-Chen-Wang-Liu-Hu-Lin-Shen-Li-Kong__DS_6woak771jubv_0
Name: Zr_Sn_JNM_2024
Authors: Haojie Mei, Liang Chen, Feifei Wang, Guisen Liu, Jing Hu, Weitong Lin, Yao Shen, Jinfu Li, Lingti Kong
Elements: Sn, Zr
Number of Configurations: 23,611
Number of Elements: 2
Number of Atoms: 688,087

Links:
https://github.com/meihaojie/Zr_Sn_system
https://doi.org/10.1016/j.jnucmat.2023.154794
a-AlOx_JCP_2020
Dataset Downloads Coming Soon Description: This dataset was used for the training of an MLIP for amorphous alumina (a-AlOx). Two configurations sets correspond to i) the actual training data and ii) additional reference data. Ab initio calculations were performedwith the Vienna Ab initio Simulation Package. The projector augmented wave method was used to treat the atomic core electrons,and the Perdew-Burke-Ernzerhof functional within the generalized gradient approximation was used to describe the electron-electron interactions. The cutoff energy for the plane-wave basis set was set to 550 eV during the ab initio calculation. The obtained reference database includes the DFT energies of 41,203 structures. The supercell size of the AlOx reference structures varied from 24 to 132 atoms. K-point values are given for structures with: Al0, Al12, Al24, Al48 and Al192.

ColabFit ID: a-AlOx_JCP_2020__Li-Ando-Watanabe__DS_70btumen3361_0
Name: a-AlOx_JCP_2020
Authors: Wenwen Li, Yasunobu Ando, Satoshi Watanabe
Elements: Al, O
Number of Configurations: 123,586
Number of Elements: 2
Number of Atoms: 4,541,918

Links:
https://doi.org/10.24435/materialscloud:y1-zd
https://doi.org/10.1063/5.0026289
aC_JCP_2023
Dataset Downloads Coming Soon Description: The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states

ColabFit ID: aC_JCP_2023__Minamitani-Obayashi-Shimizu-Watanabe__DS_bmjfal3bj4ah_0
Name: aC_JCP_2023
Authors: Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe
Elements: C
Number of Configurations: 20,195
Number of Elements: 1
Number of Atoms: 5,192,400

Links:
https://doi.org/10.5281/zenodo.7905585
https://doi.org/10.1063/5.0159349
aC_JCP_2023_test
Dataset Downloads Coming Soon Description: Test split from the 216-atom amorphous portion of the aC_JCP_2023 dataset. The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states

ColabFit ID: aC_JCP_2023_test__Minamitani-Obayashi-Shimizu-Watanabe__DS_sany2behzwv8_0
Name: aC_JCP_2023_test
Authors: Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe
Elements: C
Number of Configurations: 3,366
Number of Elements: 1
Number of Atoms: 727,056

Links:
https://doi.org/10.5281/zenodo.7905585
https://doi.org/10.1063/5.0159349
aC_JCP_2023_train
Dataset Downloads Coming Soon Description: Test split from the 216-atom amorphous portion of the aC_JCP_2023 dataset. The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states

ColabFit ID: aC_JCP_2023_train__Minamitani-Obayashi-Shimizu-Watanabe__DS_20h8o5d0ltx1_0
Name: aC_JCP_2023_train
Authors: Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe
Elements: C
Number of Configurations: 13,464
Number of Elements: 1
Number of Atoms: 2,908,224

Links:
https://doi.org/10.5281/zenodo.7905585
https://doi.org/10.1063/5.0159349
adatoms_on_single-layer_graphene_PRR2021
Dataset Downloads Coming Soon Description: This dataset consists of graphene superlattices with tungsten adatoms with properties calculated at the DFT level of theory. The authors modeled the placement of tungsten adatoms on a graphene monolayer. The resulting superlattice structures were then used to calculate electronic band structure and phonon dispersion relations. The dataset was used to investigate the effect of adatom placement on electronic band structure and phonon dispersion relations of graphene superlattices. The creation of the dataset involved the following steps: 1. Selection of the graphene monolayer as the starting point for the superlattice construction. 2. Placement of tungsten adatoms in the center of the unit cell 3. Calculation of the electronic structure and other properties of the resulting superlattice using DFT. 4. Generation of a set of reduced Brillouin zones representing the symmetry of the superlattice. 5. Calculation of the electronic band structure and phonon dispersion relations for each superlattice structure in the dataset.

ColabFit ID: adatoms_on_single-layer_graphene_PRR2021__Skurativska-Tsirkin-Natterer-Neupert-Fischer__DS_08feepssbuq5_0
Name: adatoms_on_single-layer_graphene_PRR2021
Authors: Anastasiia Skurativska, Stepan S. Tsirkin, Fabian D Natterer, Titus Neupert, Mark H Fischer
Elements: C, Cr, Ir, Mo, Nb, Os, Re, Rh, Ru, Ta, W
Number of Configurations: 18
Number of Elements: 11
Number of Atoms: 774

Links:
https://doi.org/10.24435/materialscloud:bj-bh
http://doi.org/10.1103/PhysRevResearch.3.L032003
aleatoric_epistemic_error_AIC2023
Dataset Downloads Coming Soon Description: Dataset for H2CO, with and without added noise for testing the effects of noise on quality of fit. Configurations sets are included for clean energy values with different levels of gaussian noise added to atomic forces (including a set with no noise added), and energies perturbed at different levels (including a set with no perturbation). Configuration sets correspond to individual files found at the data link.

ColabFit ID: aleatoric_epistemic_error_AIC2023__Goswami-Kaser-Bemish-Meuwly__DS_pms72aqfzc4v_0
Name: aleatoric_epistemic_error_AIC2023
Authors: Sugata Goswami, Silvan Käser, Raymond J. Bemish, Markus Meuwly
Elements: C, H, O
Number of Configurations: 28,808
Number of Elements: 3
Number of Atoms: 115,232

Links:
https://github.com/MMunibas/noise
https://doi.org/10.1016/j.aichem.2023.100033
alkali-metal_intercalation_in_disordered_carbon_anode_materials_JMCA2019
Dataset Downloads Coming Soon Description: A dataset created as part of a combination DFT-ML approach to study three alkali metals (K, Li, Na) in model carbon systems at a range of densities and degrees of disorder. The purpose of the study was to investigate the properties of alkali metals in hard (non-graphitising) and nanoporous carbons as potential anode materials for battery technology.

ColabFit ID: alkali-metal_intercalation_in_disordered_carbon_anode_materials_JMCA2019__Huang-Csanyi-Zhao-Cheng-Deringer__DS_mnt5vb22pdze_0
Name: alkali-metal_intercalation_in_disordered_carbon_anode_materials_JMCA2019
Authors: Jian-Xing Huang, Gábor Csányi, Jin-Bao Zhao, Jun Cheng, Volker L. Deringer
Elements: C, K, Li, Na
Number of Configurations: 1,365
Number of Elements: 4
Number of Atoms: 298,050

Links:
https://doi.org/10.17863/CAM.42087
https://doi.org/10.1039/C9TA05453G
alpha_brass_nanoparticles
Dataset Downloads Coming Soon Description: 53,841 structures of alpha-brass (less than 40% Zinc). Includes atomic forces and total energy. Calculated using VASP at the DFT level of theory.

ColabFit ID: alpha_brass_nanoparticles__Weinreich-Römer-Paleico-Behler__DS_agiti2oe5bqb_0
Name: alpha_brass_nanoparticles
Authors: Jan Weinreich, Anton Römer, Martín Leandro Paleico, Jörg Behler
Elements: Cu, Zn
Number of Configurations: 53,696
Number of Elements: 2
Number of Atoms: 2,956,679

Links:
https://doi.org/10.24435/materialscloud:94-aq
https://doi.org/10.1021/acs.jpcc.1c02314
cG-SchNet
Dataset Downloads Coming Soon Description: Configurations from a cG-SchNet trained on a subset of the QM9dataset. Model was trained with the intention of providing molecules withspecified functional groups or motifs, relying on sampling of molecularfingerprint data. Relaxation data for the generated molecules is computedusing ORCA software. Configuration sets include raw data fromcG-SchNet-generated configurations, with models trained on several differenttypes of target data and DFT relaxation data as a separate configurationset. Includes approximately 80,000 configurations.

ColabFit ID: cG-SchNet__Gebauer-Gastegger-Hessmann-Müller-Schütt__DS_xzaglubh0trq_0
Name: cG-SchNet
Authors: Niklas W.A. Gebauer, Michael Gastegger, Stefaan S.P. Hessmann, Klaus-Robert Müller, Kristof T. Schütt
Elements: C, F, H, N, O
Number of Configurations: 79,772
Number of Elements: 5
Number of Atoms: 1,467,492

Links:
https://github.com/atomistic-machine-learning/cG-SchNet/
https://doi.org/10.1038/s41467-022-28526-y
calcium_ferrites_as_cathodes_ca4fe9o17
Dataset Downloads Coming Soon Description: Dataset for "Appraisal of calcium ferrites as cathodes for calcium rechargeable batteries: DFT, synthesis, characterization and electrochemistry of Ca4Fe9O17" created to explore Fe-based cathode materials for Ca-ion batteries. Structures include CaFe(2+n)O(4+n), where 0 < n < 3.

ColabFit ID: calcium_ferrites_as_cathodes_ca4fe9o17__Dompablo-Casals__DS_w59koijcczkz_0
Name: calcium_ferrites_as_cathodes_ca4fe9o17
Authors: M. Elena Arroyo-de Dompablo, José Luis Casals
Elements: Ca, Fe, O
Number of Configurations: 345
Number of Elements: 3
Number of Atoms: 35,462

Links:
https://doi.org/10.24435/materialscloud:xk-sn
http://doi.org/10.1039/c9dt04688g
cathode_materials_for_rechargeable_Ca_batteries_CM2021
Dataset Downloads Coming Soon Description: Data from the publication "Enlisting Potential Cathode Materials for Rechargeable Ca Batteries". The development of rechargeable batteries based on a Ca metal anode demands the identification of suitable cathode materials. This work investigates the potential application of a variety of compounds, which are selected from the In-organic Crystal Structural Database (ICSD) considering 3d-transition metal oxysulphides, pyrophosphates, silicates, nitrides, and phosphates with a maximum of four different chemical elements in their composition. Cathode perfor-mance of CaFeSO, CaCoSO, CaNiN, Ca3MnN3, Ca2Fe(Si2O7), CaM(P2O7) (M = V, Cr, Mn, Fe, Co), CaV2(P2O7)2, Ca(VO)2(PO4)2 and α-VOPO4 is evaluated throughout the calculation of operation voltages, volume changes associated to the redox reaction and mobility of Ca2+ ions. Some materials exhibit attractive specific capacities and intercalation voltages combined with energy barriers for Ca migration around 1 eV (CaFeSO, Ca2FeSi2O7 and CaV2(P2O7)2). Based on the DFT results, αI-VOPO4 is identified as a potential Ca-cathode with a maximum theoretical specific capacity of 312 mAh/g, an average intercalation voltage of 2.8 V and calculated energy barriers for Ca migration below 0.65 eV (GGA functional).

ColabFit ID: cathode_materials_for_rechargeable_Ca_batteries_CM2021__Dompablo-Casals__DS_s00e64z80ujy_0
Name: cathode_materials_for_rechargeable_Ca_batteries_CM2021
Authors: M. Elena Arroyo-de Dompablo, Jose Luis Casals
Elements: Ca, Co, Fe, Mn, N, Ni, O, P, S, Si, V
Number of Configurations: 10,840
Number of Elements: 11
Number of Atoms: 1,034,770

Links:
https://doi.org/10.24435/materialscloud:3n-e8
http://doi.org/10.1038/s41598-019-46002-4
datasets_for_magnetic_MTP_NatSR2024_training
Dataset Downloads Coming Soon Description: This dataset comprises a training dataset for magnetic multi-component machine-learning potentials for Fe-Al systems, including different concentrations of Fe and Al (Al concentrations from 0%-50%), with fully equilibrated and perturbed atomic positions, lattice vectors and magnetic moments represented.

ColabFit ID: datasets_for_magnetic_MTP_NatSR2024_training__Kotykhov-Gubaev-Hodapp-Tantardini-Shapeev-Novikov__DS_mf8sn11cn6wa_0
Name: datasets_for_magnetic_MTP_NatSR2024_training
Authors: Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov
Elements: Al, Fe
Number of Configurations: 2,012
Number of Elements: 2
Number of Atoms: 11,440

Links:
https://gitlab.com/ivannovikov/datasets_for_magnetic_MTP
https://doi.org/10.1038/s41598-023-46951-x
datasets_for_magnetic_MTP_NatSR2024_verification
Dataset Downloads Coming Soon Description: This is the verification dataset (see companion training dataset: datasets_for_magnetic_MTP_NatSR2024_training) used in training a magnetic multi-component machine-learning potential for Fe-Al systems. The configurations from the verification set include different levels of magnetic moment perturbation than configurations from the training set. For this reason, the authors refer to this dataset as a "verification set", rather than a "validation set".

ColabFit ID: datasets_for_magnetic_MTP_NatSR2024_verification__Kotykhov-Gubaev-Hodapp-Tantardini-Shapeev-Novikov__DS_wu6xd9i8cf7i_0
Name: datasets_for_magnetic_MTP_NatSR2024_verification
Authors: Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov
Elements: Al, Fe
Number of Configurations: 336
Number of Elements: 2
Number of Atoms: 3,696

Links:
https://gitlab.com/ivannovikov/datasets_for_magnetic_MTP
https://doi.org/10.1038/s41598-023-46951-x
defected_phosphorene_ACS_2023
Dataset Downloads Coming Soon Description: This dataset contains pristine monolayer phosphorene as well as structures with monovacancies which were used to train an artificial neural network (ANN) for use with a high-dimensional neural network potentials molecular dynamics (HDNNP-MD) simulation. The publication investigates the mechanism and rates of the processes of defect diffusion, as well as monovacancy-to-divacancy defect coalescence.

ColabFit ID: defected_phosphorene_ACS_2023__Kývala-Angeletti-Franchini-Dellago__DS_k059wtxqsksu_0
Name: defected_phosphorene_ACS_2023
Authors: Lukáš Kývala, Andrea Angeletti, Cesare Franchini, Christoph Dellago
Elements: P
Number of Configurations: 5,091
Number of Elements: 1
Number of Atoms: 722,311

Links:
https://doi.org/10.5281/zenodo.8421094
https://doi.org/10.1021/acs.jpcc.3c05713
discrepencies_and_error_metrics_NPJ_2023_enhanced_validation_set
Dataset Downloads Coming Soon Description: Structures from discrepencies_and_error_metrics_NPJ_2023 validation set, enhanced by inclusion of rare events. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.

ColabFit ID: discrepencies_and_error_metrics_NPJ_2023_enhanced_validation_set__Liu-He-Mo__DS_q6e3bvq4y67a_0
Name: discrepencies_and_error_metrics_NPJ_2023_enhanced_validation_set
Authors: Yunsheng Liu, Xingfeng He, Yifei Mo
Elements: Si
Number of Configurations: 50
Number of Elements: 1
Number of Atoms: 3,198

Links:
https://github.com/mogroupumd/Silicon_MLIP_datasets
https://doi.org/10.1038/s41524-023-01123-3
discrepencies_and_error_metrics_NPJ_2023_interstitial_enhanced_training_set
Dataset Downloads Coming Soon Description: Structures from discrepencies_and_error_metrics_NPJ_2023 training set, enhanced by inclusion of interstitials. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.

ColabFit ID: discrepencies_and_error_metrics_NPJ_2023_interstitial_enhanced_training_set__Liu-He-Mo__DS_nublbp38wse0_0
Name: discrepencies_and_error_metrics_NPJ_2023_interstitial_enhanced_training_set
Authors: Yunsheng Liu, Xingfeng He, Yifei Mo
Elements: Si
Number of Configurations: 218
Number of Elements: 1
Number of Atoms: 13,629

Links:
https://github.com/mogroupumd/Silicon_MLIP_datasets
https://doi.org/10.1038/s41524-023-01123-3
discrepencies_and_error_metrics_NPJ_2023_interstitial_re_testing_set
Dataset Downloads Coming Soon Description: Structures from discrepencies_and_error_metrics_NPJ_2023 test set; these include an interstitial. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.

ColabFit ID: discrepencies_and_error_metrics_NPJ_2023_interstitial_re_testing_set__Liu-He-Mo__DS_dhe9aqs9q1wf_0
Name: discrepencies_and_error_metrics_NPJ_2023_interstitial_re_testing_set
Authors: Yunsheng Liu, Xingfeng He, Yifei Mo
Elements: Si
Number of Configurations: 100
Number of Elements: 1
Number of Atoms: 6,500

Links:
https://github.com/mogroupumd/Silicon_MLIP_datasets
https://doi.org/10.1038/s41524-023-01123-3
discrepencies_and_error_metrics_NPJ_2023_vacancy_enhanced_training_set
Dataset Downloads Coming Soon Description: Structures from discrepencies_and_error_metrics_NPJ_2023 training set; includes some structures with vacancies. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.

ColabFit ID: discrepencies_and_error_metrics_NPJ_2023_vacancy_enhanced_training_set__Liu-He-Mo__DS_qxd7wv9yabtp_0
Name: discrepencies_and_error_metrics_NPJ_2023_vacancy_enhanced_training_set
Authors: Yunsheng Liu, Xingfeng He, Yifei Mo
Elements: Si
Number of Configurations: 218
Number of Elements: 1
Number of Atoms: 13,389

Links:
https://github.com/mogroupumd/Silicon_MLIP_datasets
https://doi.org/10.1038/s41524-023-01123-3
discrepencies_and_error_metrics_NPJ_2023_vacancy_re_testing_set
Dataset Downloads Coming Soon Description: Structures from discrepencies_and_error_metrics_NPJ_2023 test set; these include a single migrating vacancy. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.

ColabFit ID: discrepencies_and_error_metrics_NPJ_2023_vacancy_re_testing_set__Liu-He-Mo__DS_la08goe2lz0g_0
Name: discrepencies_and_error_metrics_NPJ_2023_vacancy_re_testing_set
Authors: Yunsheng Liu, Xingfeng He, Yifei Mo
Elements: Si
Number of Configurations: 100
Number of Elements: 1
Number of Atoms: 6,300

Links:
https://github.com/mogroupumd/Silicon_MLIP_datasets
https://doi.org/10.1038/s41524-023-01123-3
disordered_transition_metal_oxyfluorides_EA2021
Dataset Downloads Coming Soon Description: Data from "On-the-fly assessment of diffusion barriers of disordered transition metal oxyfluorides using local descriptors". The dataset contains the result of 48 Nudged Elastic Band calculations of Li(2-x)VO2F diffusion barriers. The NEB was performed with VASP, using projector augmented-wave (PAW) method to describe electron-ion interaction. The disordered rock salt cells were created using a 3 x 4 x 4 supercell containing 96 atoms (in case of no vacancies). PBE is used as XC functional while a rotationally invariant Hubbard U correction was applied to the d orbital of V with a U value of 3.25 eV.

ColabFit ID: disordered_transition_metal_oxyfluorides_EA2021__Chang-Jørgensen-Loftager-Bhowmik-Lastra-Vegge__DS_k1k8iul6kgm2_0
Name: disordered_transition_metal_oxyfluorides_EA2021
Authors: Jin Hyun Chang, Peter Bjørn Jørgensen, Simon Loftager, Arghya Bhowmik, Juan María García Lastra, Tejs Vegge
Elements: F, Li, O, V
Number of Configurations: 233
Number of Elements: 4
Number of Atoms: 20,670

Links:
https://doi.org/10.24435/materialscloud:9v-3q
http://doi.org/10.1016/j.electacta.2021.138551
doped_CsPbI3_energetics_test
Dataset Downloads Coming Soon Description: The test set from the doped CsPbI3 energetics dataset. This dataset was created to explore the effect of Cd and Pb substitutions on the structural stability of inorganic lead halide perovskite CsPbI3. CsPbI3 undergoes a direct to indirect band-gap phase transition at room temperature. The dataset contains configurations of CsPbI3 with low levels of Cd and Zn, which were used to train a GNN model to predict the energetics of structures with higher levels of substitutions.

ColabFit ID: doped_CsPbI3_energetics_test__Eremin-Humonen-Lazarev-Pushkarev-Budennyy__DS_a2ftwj9b873a_0
Name: doped_CsPbI3_energetics_test
Authors: Roman A. Eremin, Innokentiy S. Humonen, Alexey A. Kazakov, Vladimir D. Lazarev, Anatoly P. Pushkarev, Semen A. Budennyy
Elements: Cd, Cs, I, Pb, Zn
Number of Configurations: 60
Number of Elements: 5
Number of Atoms: 9,600

Links:
https://github.com/AIRI-Institute/doped_CsPbI3_energetics
https://doi.org/10.1016/j.commatsci.2023.112672
doped_CsPbI3_energetics_train_validate
Dataset Downloads Coming Soon Description: The training + validation set from the doped CsPbI3 energetics dataset. This dataset was created to explore the effect of Cd and Pb substitutions on the structural stability of inorganic lead halide perovskite CsPbI3. CsPbI3 undergoes a direct to indirect band-gap phase transition at room temperature. The dataset contains configurations of CsPbI3 with low levels of Cd and Zn, which were used to train a GNN model to predict the energetics of structures with higher levels of substitutions.

ColabFit ID: doped_CsPbI3_energetics_train_validate__Eremin-Humonen-Lazarev-Pushkarev-Budennyy__DS_doj9b688juif_0
Name: doped_CsPbI3_energetics_train_validate
Authors: Roman A. Eremin, Innokentiy S. Humonen, Alexey A. Kazakov, Vladimir D. Lazarev, Anatoly P. Pushkarev, Semen A. Budennyy
Elements: Cd, Cs, I, Pb, Zn
Number of Configurations: 142
Number of Elements: 5
Number of Atoms: 22,720

Links:
https://github.com/AIRI-Institute/doped_CsPbI3_energetics
https://doi.org/10.1016/j.commatsci.2023.112672
electrode_materials_for_ca-based_rechargeable_batteries
Dataset Downloads Coming Soon Description: Dataset for "Analysis of minerals as electrode materials for Ca-based rechargeable batteries". Includes DFT structures of pyroxenes, garnet and carbonates. Dataset was produced to pursue identification of Ca-based high specific energy cathode materials.

ColabFit ID: electrode_materials_for_ca-based_rechargeable_batteries__Dompablo-Casals__DS_swl40hdnt479_0
Name: electrode_materials_for_ca-based_rechargeable_batteries
Authors: M. Elena Arroyo-de Dompablo, Jose Luis Casals
Elements: C, Ca, Cr, Mn, O, Si
Number of Configurations: 4,726
Number of Elements: 6
Number of Atoms: 550,074

Links:
https://doi.org/10.24435/materialscloud:3n-e8
http://doi.org/10.1038/s41598-019-46002-4
ferroelectricity_and_metallicity_in_BaTiO3_JMCC2021
Dataset Downloads Coming Soon Description: Dataset for "Interplay between ferroelectricity and metallicity in BaTiO3", exploring properties of ferroelectric barium titanate (BaTiO3), including the effects of electron and hole doping. Includes configuration sets for unit cells and supercells of BaTiO3.

ColabFit ID: ferroelectricity_and_metallicity_in_BaTiO3_JMCC2021__Michel-Esswein-Spaldin__DS_1t2xs8bzygtp_0
Name: ferroelectricity_and_metallicity_in_BaTiO3_JMCC2021
Authors: Veronica F. Michel, Tobias Esswein, Nicola A. Spaldin
Elements: Al, Ba, K, La, Nb, O, Sc, Ti, V
Number of Configurations: 1,127
Number of Elements: 9
Number of Atoms: 19,125

Links:
https://doi.org/10.24435/materialscloud:f4-94
http://doi.org/10.1039/D1TC01868J
flexible_molecules_JCP2021
Dataset Downloads Coming Soon Description: Configurations of azobenzene featuring a cis to trans thermal inversion through three channels: inversion, rotation, and rotation assisted by inversion; and configurations of glycine as a simpler comparison molecule. All calculations were performed in FHI-aims software using the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional with the Tkatchenko-Scheffler (TS) method to account for van der Waals (vdW) interactions. The azobenzene sets contain calculations from several different MD simulations, including two long simulations initialized at 300 K; short simulations (300 steps) initialized at 300 K and shorter (.5fs) timestep; four simulations, two starting from each of cis and trans isomer, at 750 K (initialized at 3000 K); and simulations at 50 K (initialized at 300 K). The glycine isomerization set was built using one MD simulation starting from each of two different minima. Initializatin and simulation temperature were 500 K.

ColabFit ID: flexible_molecules_JCP2021__Vassilev-Galindo-Fonseca-Poltavsky-Tkatchenko__DS_i23sbm1o45sj_0
Name: flexible_molecules_JCP2021
Authors: Valentin Vassilev-Galindo, Gregory Fonseca, Igor Poltavsky, Alexandre Tkatchenko
Elements: C, H, N, O
Number of Configurations: 69,182
Number of Elements: 4
Number of Atoms: 1,520,340

Links:
https://doi.org/10.1063/5.0038516
https://doi.org/10.1063/5.0038516
glass-ceramic_lithium_thiophosphate_electrolytes_
Dataset Downloads Coming Soon Description: This database contains computationally generated atomic structures of glass-ceramics lithium thiophosphates (gc-LPS) with the general composition (Li2S)x(P2S5)1-x. Total energies and interatomic forces from density-functional theory (DFT) calculations are included. The DFT calculations used projector-augmented-wave (PAW) pseudopotentials and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional as implemented in the Vienna Ab Initio Simulation Package (VASP) and a kinetic energy cutoff of 520 eV. The first Brillouin zone was sampled using VASP's fully automatic k-point scheme with a length parameter Rk = 25Å. The gc-LPS structures were generated using a combination of different sampling methods. Initial amorphous structure models were generated with ab initio molecular dynamics (AIMD) simulations of supercells at 1200 K using a Nose-Hoover thermostat with a time step of 1 fs. To obtain near-ground-state structures as reference for the machine-learning potential, 150 evenly spaced snapshots were extracted from the AIMD trajectories that were reoptimized with DFT geometry optimizations at zero Kelvin. Additional structures were generated by scaling the lattice parameters of the crystalline LPS structures (see below) by ±15% and perturbing atomic positions in AIMD simulations as described above.The resulting database was used to train a specialized ANN potential for the sampling of structures along the Li2S-P2S5 composition line with a genetic-algorithm (GA) as implemented in the atomistic evolution (ævo) package, following a previously reported protocol. Starting from supercells of the ideal crystal structures, either Li and S atoms were removed with a ratio of 2:1, or P and S atoms were removed with a ratio of 2:5, and low-energy configurations were determined with GA sampling. A population size of 32 trials and a mutation rate of 10% were employed. The ANN potential was iteratively refined by including additional sampled structures in the training. For each composition, at least 10 lowest energy structure models identified with the ANN-GA approach were selected and fully relaxed with DFT.Also included in the present database are the XSF files of the previously reported crystalline phases LiPS3, Li2PS3, Li4P2S7, Li7P3S11, α-Li3PS4, β-Li3PS4, γ-Li3PS4, and Li48P16S61. The crystal structures were obtained from the Inorganic Crystal Structure Database (ICSD). the Materials Project (MP) database, the Open Quantum Materials Database (OQMD), and the AFLOW database. The configuration names indicate the journal reference and the database.

ColabFit ID: glass-ceramic_lithium_thiophosphate_electrolytes___Guo-Artrith__DS_cpznjcu51bvg_0
Name: glass-ceramic_lithium_thiophosphate_electrolytes_
Authors: Haoyue Guo, Nongnuch Artrith
Elements: Li, P, S
Number of Configurations: 6,055
Number of Elements: 3
Number of Atoms: 264,604

Links:
https://doi.org/10.24435/materialscloud:j5-tz
https://doi.org/10.1021/acs.chemmater.2c00267
linear_magnetic_coefficient_in_Cr2O3_JPCM2024
Dataset Downloads Coming Soon Description: We establish the sign of the linear magnetoelectric (ME) coefficient, α, in chromia, Cr₂O₃. Cr₂O₃ is the prototypical linear ME material, in which an electric (magnetic) field induces a linearly proportional magnetization (polarization), and a single magnetic domain can be selected by annealing in combined magnetic (H) and electric (E) fields. Opposite antiferromagnetic domains have opposite ME responses, and which antiferromagnetic domain corresponds to which sign of response has previously been unclear. We use density functional theory (DFT) to calculate the magnetic response of a single antiferromagnetic domain of Cr₂O₃ to an applied in-plane electric field at 0 K. We find that the domain with nearest neighbor magnetic moments oriented away from (towards) each other has a negative (positive) in-plane ME coefficient, α⊥, at 0 K. We show that this sign is consistent with all other DFT calculations in the literature that specified the domain orientation, independent of the choice of DFT code or functional, the method used to apply the field, and whether the direct (magnetic field) or inverse (electric field) ME response was calculated. Next, we reanalyze our previously published spherical neutron polarimetry data to determine the antiferromagnetic domain produced by annealing in combined E and H fields oriented along the crystallographic symmetry axis at room temperature. We find that the antiferromagnetic domain with nearest-neighbor magnetic moments oriented away from (towards) each other is produced by annealing in (anti-)parallel E and H fields, corresponding to a positive (negative) axial ME coefficient, α∥, at room temperature. Since α⊥ at 0 K and α∥ at room temperature are known to be of opposite sign, our computational and experimental results are consistent. This dataset contains the input data to reproduce the calculation of the magnetoelectric effect as plotted in Fig. 3 of the manuscript, for Elk, Vasp, and Quantum Espresso.

ColabFit ID: linear_magnetic_coefficient_in_Cr2O3_JPCM2024__Bousquet-Lelièvre-Berna-Qureshi-Soh-Spaldin-Urru-Verbeek-Weber__DS_82x5bfiiyaij_0
Name: linear_magnetic_coefficient_in_Cr2O3_JPCM2024
Authors: Eric Bousquet, Eddy Lelièvre-Berna, Navid Qureshi, Jian-Rui Soh, Nicola Ann Spaldin, Andrea Urru, Xanthe Henderike Verbeek, Sophie Francis Weber
Elements: Cr, O
Number of Configurations: 165
Number of Elements: 2
Number of Atoms: 1,650

Links:
https://doi.org/10.24435/materialscloud:ek-fp
http://doi.org/10.1088/1361-648X/ad1a59
local_polarization_in_oxygen-deficient_LaMnO3_PRR2020
Dataset Downloads Coming Soon Description: This dataset contains structural calculations of LaMnO3 carried out in Quantum ESPRESSO at the DFT-PBEsol+U level of theory. The dataset was built to explore strained and stoichiometric and oxygen-deficient LaMnO3.

ColabFit ID: local_polarization_in_oxygen-deficient_LaMnO3_PRR2020__Ricca-Niederhauser-Aschauer__DS_t8goqf2uglhj_0
Name: local_polarization_in_oxygen-deficient_LaMnO3_PRR2020
Authors: Chiara Ricca, Nicolas Niederhauser, Ulrich Aschauer
Elements: Ba, La, Mn, O, Ti
Number of Configurations: 4,514
Number of Elements: 5
Number of Atoms: 174,337

Links:
https://doi.org/10.24435/materialscloud:m9-9d
http://doi.org/10.1103/PhysRevResearch.2.042040
mbGDML_maldonado_2023
Dataset Downloads Coming Soon Description: Configurations of water, acetonitrile and methanol, simulated with ASE and modeled using a variety of software and methods: GAP, SchNet, GDML, ORCA and mbGDML. Forces and potential energy included; metadata includes kinetic energy and velocities.

ColabFit ID: mbGDML_maldonado_2023__Maldonado__DS_e94my2wrh074_0
Name: mbGDML_maldonado_2023
Authors: Alex M. Maldonado
Elements: C, H, N, O
Number of Configurations: 24,543
Number of Elements: 4
Number of Atoms: 712,134

Links:
https://doi.org/10.5281/zenodo.7112197
https://doi.org/10.26434/chemrxiv-2023-wdd1r
mlearn_Cu_test
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Cu configurations

ColabFit ID: mlearn_Cu_test__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_xmgn3ofqzon9_0
Name: mlearn_Cu_test
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Cu
Number of Configurations: 31
Number of Elements: 1
Number of Atoms: 3,178

Links:
https://github.com/materialsvirtuallab/mlearn
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Cu_train
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Cu configurations

ColabFit ID: mlearn_Cu_train__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_3pv3hck35iy6_0
Name: mlearn_Cu_train
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Cu
Number of Configurations: 262
Number of Elements: 1
Number of Atoms: 27,416

Links:
https://github.com/materialsvirtuallab/mlearn/tree/master/data
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Ge_test
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ge configurations

ColabFit ID: mlearn_Ge_test__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_pyrk84w3auvb_0
Name: mlearn_Ge_test
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Ge
Number of Configurations: 25
Number of Elements: 1
Number of Atoms: 1,568

Links:
https://github.com/materialsvirtuallab/mlearn
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Ge_train
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ge configurations

ColabFit ID: mlearn_Ge_train__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_ot3m0rxle8fs_0
Name: mlearn_Ge_train
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Ge
Number of Configurations: 228
Number of Elements: 1
Number of Atoms: 14,072

Links:
https://github.com/materialsvirtuallab/mlearn/tree/master/data
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Li_test
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Li configurations

ColabFit ID: mlearn_Li_test__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_jf20mvwao5xi_0
Name: mlearn_Li_test
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Li
Number of Configurations: 29
Number of Elements: 1
Number of Atoms: 1,320

Links:
https://github.com/materialsvirtuallab/mlearn
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Li_train
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Li configurations

ColabFit ID: mlearn_Li_train__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_h7zao9ya9sd8_0
Name: mlearn_Li_train
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Li
Number of Configurations: 241
Number of Elements: 1
Number of Atoms: 11,576

Links:
https://github.com/materialsvirtuallab/mlearn/tree/master/data
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Mo_test
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Mo configurations

ColabFit ID: mlearn_Mo_test__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_l0b6iq3no012_0
Name: mlearn_Mo_test
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Mo
Number of Configurations: 23
Number of Elements: 1
Number of Atoms: 1,189

Links:
https://github.com/materialsvirtuallab/mlearn
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Mo_train
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Mo configurations

ColabFit ID: mlearn_Mo_train__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_ytoet4uyc32k_0
Name: mlearn_Mo_train
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Mo
Number of Configurations: 194
Number of Elements: 1
Number of Atoms: 10,087

Links:
https://github.com/materialsvirtuallab/mlearn/tree/master/data
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Ni_test
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ni configurations

ColabFit ID: mlearn_Ni_test__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_zjkz9664bapl_0
Name: mlearn_Ni_test
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Ni
Number of Configurations: 31
Number of Elements: 1
Number of Atoms: 3,158

Links:
https://github.com/materialsvirtuallab/mlearn
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Ni_train
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ni configurations

ColabFit ID: mlearn_Ni_train__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_lfyd4jv627cr_0
Name: mlearn_Ni_train
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Ni
Number of Configurations: 263
Number of Elements: 1
Number of Atoms: 27,420

Links:
https://github.com/materialsvirtuallab/mlearn/tree/master/data
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Si_test
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Si configurations

ColabFit ID: mlearn_Si_test__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_lvjng6g41bwy_0
Name: mlearn_Si_test
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Si
Number of Configurations: 25
Number of Elements: 1
Number of Atoms: 1,525

Links:
https://github.com/materialsvirtuallab/mlearn
https://doi.org/10.1021/acs.jpca.9b08723
mlearn_Si_train
Dataset Downloads Coming Soon Description: A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Si configurations

ColabFit ID: mlearn_Si_train__Zuo-Chen-Li-Deng-Chen-Behler-Csányi-Shapeev-Thompson-Wood-Ong__DS_eltotrjoqonr_0
Name: mlearn_Si_train
Authors: Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
Elements: Si
Number of Configurations: 214
Number of Elements: 1
Number of Atoms: 13,233

Links:
https://github.com/materialsvirtuallab/mlearn/tree/master/data
https://doi.org/10.1021/acs.jpca.9b08723
oxygen-vacancy_defects_in_Cu2O(111)
Dataset Downloads Coming Soon Description: This dataset investigates the effect of defects, such as copper and oxygen vacancies, in cuprous oxide films. Structures include oxygen vacancies formed in proximity of a reconstructed Cu2O(111) surface, where the outermost unsaturated copper atoms are removed, thus forming non-stoichiometric surface layers with copper vacancies. Surface and bulk properties are addressed by modelling a thick and symmetric slab consisting of 8 atomic layers and 736 atoms. Configuration sets include bulk, slab, vacancy and oxygen gas. Version v1

ColabFit ID: oxygen-vacancy_defects_in_Cu2O(111)__Dongfang-Iannuzzi-Al-Hamdani__DS_ots0ul69q842_0
Name: oxygen-vacancy_defects_in_Cu2O(111)
Authors: Nanchen Dongfang, Marcella Iannuzzi, Yasmine Al-Hamdani
Elements: Cu, O
Number of Configurations: 864
Number of Elements: 2
Number of Atoms: 606,270

Links:
https://doi.org/10.24435/materialscloud:3z-bk
http://doi.org/10.1088/2516-1075/ace0aa
pure_magnesium_DFT_PRM2020
Dataset Downloads Coming Soon Description: This dataset provides DFT (as implemented in VASP) calculations for pure magnesium. Configuration sets include bulk, generalized stacking fault energies, stable stacking fault, decohesion, relaxed surfaces, dimer, corner and rod, and vacancy configurations of Mg.

ColabFit ID: pure_magnesium_DFT_PRM2020__Yin-Stricker-Curtin__DS_48yn17885mdi_0
Name: pure_magnesium_DFT_PRM2020
Authors: Binglun Yin, Markus Stricker, W. A. Curtin
Elements: Mg
Number of Configurations: 405
Number of Elements: 1
Number of Atoms: 10,730

Links:
https://doi.org/10.24435/materialscloud:8f-1s
https://doi.org/10.1103/PhysRevMaterials.4.103602
q-AQUA
Dataset Downloads Coming Soon Description: The a-AQUA dataset was generated to address the need for a training set for a water PES that includes 2-body, 3-body and 4-body interactions calculated at the CCSD(T) level of theory. Structures were selected from the existing HBB2-pol and MB-pol datasets. For each water dimer structure, CCSD(T)/aug-cc-pVTZ calculations were performed with an additional 3s3p2d1f basis set; exponents equal to (0.9, 0.3, 0.1) for sp, (0.6, 0.2) for d, and 0.3 for f. This additional basis is placed at the center of mass (COM) of each dimer configuration. The basis set superposition error (BSSE) correction was determined with the counterpoise scheme. CCSD(T)/aug-cc-pVQZ calculations were then performed with the same additional basis set and BSSE correction. Final CCSD(T)/CBS energies were obtained by extrapolation over the CCSD(T)/aug-cc-pVTZ and CCSD(T)/aug-cc-pVQZ 2-b energies. All ab initio calculations were performed using Molpro package.Trimer structures were calculated at CCSD(T)-F12a/aug-cc-pVTZ with BSSE correction. Four-body structure calculations were performed at CCSD(T)-F12 level.

ColabFit ID: q-AQUA__Yu-Qu-Houston-Conte-Nandi-Bowman__DS_p9qjlc9l8scx_0
Name: q-AQUA
Authors: Qi Yu, Chen Qu, Paul L. Houston, Riccardo Conte, Apurba Nandi, Joel M. Bowman
Elements: H, O
Number of Configurations: 120,372
Number of Elements: 2
Number of Atoms: 878,157

Links:
https://github.com/jmbowma/q-AQUA
https://doi.org/10.1021/acs.jpclett.2c00966
rMD17
Dataset Downloads Coming Soon Description: A dataset of 10 molecules (aspirin, azobenzene, benzene, ethanol, malonaldehyde, naphthalene, paracetamol, salicylic, toluene, uracil) with 100,000 structures calculated for each at the PBE/def2-SVP level of theory using ORCA. Based on the MD17 dataset, but with refined measurements.

ColabFit ID: rMD17__Christensen-Lilienfeld__DS_8rafgy0ly6bt_0
Name: rMD17
Authors: Anders S. Christensen, O. Anatole von Lilienfeld
Elements: C, H, N, O
Number of Configurations: 999,988
Number of Elements: 4
Number of Atoms: 15,599,712

Links:
https://doi.org/10.6084/m9.figshare.12672038.v3
https://doi.org/10.48550/arXiv.2007.09593
reactive_hydrogen_ACS_2023
Dataset Downloads Coming Soon Description: This dataset contains structures of Cu, including Cu(111), Cu(100), Cu(110), and Cu(211). Slab settings are as follows: 3 x 3, 6-layered slabs for Cu(111), (100), and (110) surfaces; 1 x 3, 6-layered slabs for Cu(211) surface. Includes some structures representing interation of H2 with one of the Cu surfaces and some structures of Cu sampled at different temperatures.

ColabFit ID: reactive_hydrogen_ACS_2023__Stark-Westermayr-Douglas-Gallardo-Gardner-Habershon-Maurer__DS_apdpxdjx082p_0
Name: reactive_hydrogen_ACS_2023
Authors: Wojciech G. Stark, Julia Westermayr, Oscar A. Douglas-Gallardo, James Gardner, Scott Habershon, Reinhard J. Maurer
Elements: Cu, H
Number of Configurations: 3,413
Number of Elements: 2
Number of Atoms: 191,104

Links:
https://dx.doi.org/10.17172/NOMAD/2023.05.03-2
https://pubs.acs.org/doi/full/10.1021/acs.jpcc.3c06648
reduced-perovskite_and_oxidized-marokite_oxides
Dataset Downloads Coming Soon Description: Dataset contains DFT calculations of oxygen-deficient perovskites from the Ca2Fe2O5-brownmillerite and Ca2Mn2O5 structures; and tunnel CaMn4O8, a derivative of the CaMn2O4-marokite with Ca vacancies. The dataset was produced to investigate the effects of oxygen introduction or Ca vacancy introduction in ternary transition metal oxides, as a means to assess potential new Ca-ion battery materials.

ColabFit ID: reduced-perovskite_and_oxidized-marokite_oxides__Dompablo-Casals__DS_cl44a8aa0fgb_0
Name: reduced-perovskite_and_oxidized-marokite_oxides
Authors: M. Elena Arroyo-de Dompablo, José Luis Casals
Elements: Ca, Fe, Mn, O
Number of Configurations: 2,908
Number of Elements: 4
Number of Atoms: 386,438

Links:
https://doi.org/10.24435/materialscloud:x9-qr
http://doi.org/10.1016/j.ensm.2019.06.002
reduced-perovskite_and_oxidized-marokite_oxides
Dataset Downloads Coming Soon Description: Dataset contains DFT calculations of oxygen-deficient perovskites from the Ca2Fe2O5-brownmillerite and Ca2Mn2O5 structures; and tunnel CaMn4O8, a derivative of the CaMn2O4-marokite with Ca vacancies. The dataset was produced to investigate the effects of oxygen introduction or Ca vacancy introduction in ternary transition metal oxides, as a means to assess potential new Ca-ion battery materials.

ColabFit ID: reduced-perovskite_and_oxidized-marokite_oxides__Dompablo-Casals__DS_u7fu1meyxvpc_0
Name: reduced-perovskite_and_oxidized-marokite_oxides
Authors: M. Elena Arroyo-de Dompablo, José Luis Casals
Elements: Ca, Fe, Mn, O
Number of Configurations: 2,919
Number of Elements: 4
Number of Atoms: 387,258

Links:
https://doi.org/10.24435/materialscloud:x9-qr
http://doi.org/10.1016/j.ensm.2019.06.002
sGDML_Aspirin_ccsd_NC2018_test
Dataset Downloads Coming Soon Description: The test set of a train/test pair from the aspirin dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running abinitio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated by all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set CCSD/cc-pVDZ was used for aspirin. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Aspirin_ccsd_NC2018_test__Chmiela-Sauceda-Müller-Tkatchenko__DS_4e7a9g2kav0a_0
Name: sGDML_Aspirin_ccsd_NC2018_test
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H, O
Number of Configurations: 500
Number of Elements: 3
Number of Atoms: 10,500

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Aspirin_ccsd_NC2018_train
Dataset Downloads Coming Soon Description: The train set of a train/test pair from the aspirin dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running abinitio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated by all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set CCSD/cc-pVDZ was used for aspirin. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Aspirin_ccsd_NC2018_train__Chmiela-Sauceda-Müller-Tkatchenko__DS_xy48avqcknnk_0
Name: sGDML_Aspirin_ccsd_NC2018_train
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H, O
Number of Configurations: 1,000
Number of Elements: 3
Number of Atoms: 21,000

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Benzene_DFT_NC2018
Dataset Downloads Coming Soon Description: The data used for training the DFT models were created running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Forces and energies were computed using all-electrons at the generalized gradient approximation level of theory with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional, treating van der Waals interactions with the Tkatchenko-Scheffler (TS) method. All calculations were performed with FHI-aims. The final training data was generated by subsampling the full trajectory under preservation of the Maxwell-Boltzmann distribution for the energies.

ColabFit ID: sGDML_Benzene_DFT_NC2018__Chmiela-Sauceda-Müller-Tkatchenko__DS_q9y1aat05u42_0
Name: sGDML_Benzene_DFT_NC2018
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H
Number of Configurations: 49,863
Number of Elements: 2
Number of Atoms: 598,356

Links:
http://sgdml.org/
https://doi.org/10.1126/sciadv.1603015
sGDML_Benzene_ccsdt_NC2018_test
Dataset Downloads Coming Soon Description: The test set of a train/test pair from the benzene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single , double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for benzene. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Benzene_ccsdt_NC2018_test__Chmiela-Sauceda-Müller-Tkatchenko__DS_jol8dvjej92n_0
Name: sGDML_Benzene_ccsdt_NC2018_test
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H
Number of Configurations: 500
Number of Elements: 2
Number of Atoms: 6,000

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Benzene_ccsdt_NC2018_train
Dataset Downloads Coming Soon Description: The train set of a train/test pair from the benzene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single , double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for benzene. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Benzene_ccsdt_NC2018_train__Chmiela-Sauceda-Müller-Tkatchenko__DS_8xk0v5v9dbx0_0
Name: sGDML_Benzene_ccsdt_NC2018_train
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H
Number of Configurations: 1,000
Number of Elements: 2
Number of Atoms: 12,000

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Ethanol_ccsdt_NC2018_test
Dataset Downloads Coming Soon Description: The test set of a train/test pair from the ethanol dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)).The Dunning correlation-consistent basis set cc-pVTZ was used for ethanol. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Ethanol_ccsdt_NC2018_test__Chmiela-Sauceda-Müller-Tkatchenko__DS_iavvqpb14zqv_0
Name: sGDML_Ethanol_ccsdt_NC2018_test
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H, O
Number of Configurations: 1,000
Number of Elements: 3
Number of Atoms: 9,000

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Ethanol_ccsdt_NC2018_train
Dataset Downloads Coming Soon Description: The train set of a train/test pair from the ethanol dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)).The Dunning correlation-consistent basis set cc-pVTZ was used for ethanol. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Ethanol_ccsdt_NC2018_train__Chmiela-Sauceda-Müller-Tkatchenko__DS_dalgmbg32lwz_0
Name: sGDML_Ethanol_ccsdt_NC2018_train
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H, O
Number of Configurations: 998
Number of Elements: 3
Number of Atoms: 8,982

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Malonaldehyde_ccsdt_NC2018_test
Dataset Downloads Coming Soon Description: The test set of a train/test pair from the malonaldehyde dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Malonaldehyde_ccsdt_NC2018_test__Chmiela-Sauceda-Müller-Tkatchenko__DS_yw3sqtaq40ll_0
Name: sGDML_Malonaldehyde_ccsdt_NC2018_test
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H, O
Number of Configurations: 500
Number of Elements: 3
Number of Atoms: 4,500

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Malonaldehyde_ccsdt_NC2018_train
Dataset Downloads Coming Soon Description: The train set of a train/test pair from the malonaldehyde dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Malonaldehyde_ccsdt_NC2018_train__Chmiela-Sauceda-Müller-Tkatchenko__DS_201vhvywll83_0
Name: sGDML_Malonaldehyde_ccsdt_NC2018_train
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H, O
Number of Configurations: 1,000
Number of Elements: 3
Number of Atoms: 9,000

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Toluene_ccsdt_NC2018_test
Dataset Downloads Coming Soon Description: The test set of a train/test pair from the toluene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Toluene_ccsdt_NC2018_test__Chmiela-Sauceda-Müller-Tkatchenko__DS_ega58h2u9rwr_0
Name: sGDML_Toluene_ccsdt_NC2018_test
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H
Number of Configurations: 501
Number of Elements: 2
Number of Atoms: 7,515

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
sGDML_Toluene_ccsdt_NC2018_train
Dataset Downloads Coming Soon Description: The train set of a train/test pair from the toluene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.

ColabFit ID: sGDML_Toluene_ccsdt_NC2018_train__Chmiela-Sauceda-Müller-Tkatchenko__DS_o8ssxj2re4kx_0
Name: sGDML_Toluene_ccsdt_NC2018_train
Authors: Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
Elements: C, H
Number of Configurations: 1,000
Number of Elements: 2
Number of Atoms: 15,000

Links:
http://sgdml.org/
https://doi.org/10.1038/s41467-018-06169-2
solute_strengthening_of_prism_edge_locations_in_Mg_alloys
Dataset Downloads Coming Soon Description: This dataset includes Mg and Mg-Zn alloy structures with solute atoms at the prism edge locations. The dataset was created to study the strengthening effect of solute atoms at the prism edge locations in Mg alloys.

ColabFit ID: solute_strengthening_of_prism_edge_locations_in_Mg_alloys__Niazi-Curtin__DS_b4o9n4n8b7sp_0
Name: solute_strengthening_of_prism_edge_locations_in_Mg_alloys
Authors: Masoud Rahbar Niazi, W. A Curtin
Elements: Mg, Zn
Number of Configurations: 94
Number of Elements: 2
Number of Atoms: 28,615

Links:
https://doi.org/10.24435/materialscloud:1e-c7
http://doi.org/10.1016/j.euromechsol.2023.105128
solvated_protein_fragments_JCTC_2019
Dataset Downloads Coming Soon Description: The solvated protein fragments dataset was generated as a partner benchmark dataset, along with SN2, for measuring the performance of machine learning models, in particular PhysNet, at describing chemical reactions, long-range interactions, and condensed phase systems. The dataset contains structures for all possible "amons" (hydrogen-saturated covalently bonded fragments) of up to eight heavy atoms (C, N, O, S) that can be derived from chemical graphs of proteins containing the 20 natural amino acids connected via peptide bonds or disulfide bridges. For amino acids that can occur in different charge states due to (de)protonation (i.e., carboxylic acids that can be negatively charged or amines that can be positively charged), all possible structures with up to a total charge of +-2e are included. In total, the dataset provides reference energies, forces, and dipole moments for 2,731,180 structures calculated at the revPBE-D3(BJ)/def2-TZVP level of theory using ORCA 4.0.1.

ColabFit ID: solvated_protein_fragments_JCTC_2019__Unke-Meuwly__DS_ctjgc03xdauc_0
Name: solvated_protein_fragments_JCTC_2019
Authors: Oliver T. Unke, Markus Meuwly
Elements: C, H, N, O, S
Number of Configurations: 2,731,180
Number of Elements: 5
Number of Atoms: 58,395,272

Links:
https://doi.org/10.5281/zenodo.2605372
https://doi.org/10.1021/acs.jctc.9b00181
stable_and_metastable_phases_in_sputtered_CuInS2
Dataset Downloads Coming Soon Description: The chalcopyrite Cu(In,Ga)S2 has gained renewed interest in recent years due to its potential application in tandem solar cells. In this contribution, a combined theoretical and experimental approach is applied to investigate stable and metastable phases forming in sputtered CuInS2 (CIS) thin films. Ab initio calculations are performed to obtain formation energies, X-ray diffraction patterns, and Raman spectra of various CIS polytypes and related compounds. Multiple low-energy CIS structures with zinc-blende and wurtzite-derived lattices are identified and their XRD/Raman patterns are shown to contain many overlapping features, which could lead to misidentification unless the techniques are duly combined and analyzed. The results are verified against experimental XRD/Raman spectra measured on a series of CIS films with different compositions and treated at different temperatures, revealing the formation of several CIS polymorphs and secondary phases. The characteristic features and the mechanisms behind the formation of different phases are discussed with the focus on the thin-film photovoltaic application of CIS. The dataset contains structures and VASP output files used to derive the discussed trends. version 2

ColabFit ID: stable_and_metastable_phases_in_sputtered_CuInS2__Larsen-Sopiha-Persson-Platzer-Björkman-Edoff__DS_qk4qfeka1e3m_0
Name: stable_and_metastable_phases_in_sputtered_CuInS2
Authors: Jes Larsen, Kostiantyn Sopiha, Clas Persson, Charlotte Platzer-Björkman, Marika Edoff
Elements: Cu, In, Na, S
Number of Configurations: 3,105
Number of Elements: 4
Number of Atoms: 117,948

Links:
https://doi.org/10.24435/materialscloud:5n-1e
http://doi.org/https://doi.org/10.1002/advs.202200848
tmQM_wB97MV
Dataset Downloads Coming Soon Description: tmQM_wB97MV contains configurations from the tmQM dataset, with several structures from tmQM that were found to be missing hydrogens filtered out, and energies of all other structures recomputed at the wB97M-V/def2-SVPD level of DFT.

ColabFit ID: tmQM_wB97MV__Garrison-Heras-Domingo-Kitchin-Gomes-Ulissi-Blau__DS_qde4v70hhmmj_0
Name: tmQM_wB97MV
Authors: Aaron G. Garrison, Javier Heras-Domingo, John R. Kitchin, Gabriel dos Passos Gomes, Zachary W. Ulissi, Samuel M. Blau
Elements: Ag, As, Au, B, Br, C, Cd, Cl, Co, Cr, Cu, F, Fe, H, Hf, Hg, I, Ir, La, Mn, Mo, N, Nb, Ni, O, Os, P, Pd, Pt, Re, Rh, Ru, S, Sc, Se, Si, Ta, Tc, Ti, V, W, Y, Zn, Zr
Number of Configurations: 86,507
Number of Elements: 44
Number of Atoms: 5,710,877

Links:
https://github.com/ulissigroup/tmQM_wB97MV
https://doi.org/10.1021/acs.jcim.3c01226
vanadium_in_high_entropy_alloys_AM2020
Dataset Downloads Coming Soon Description: Dataset created for "Vanadium is an optimal element for strengthening in both fcc and bcc high-entropy alloys", to explore the effect of V in the high-entropy systems fcc Co-Cr-Fe-Mn-Ni-V and bcc Cr-Mo-Nb-Ta-V-W-Hf-Ti-Zr. Structures include pure V, misfit volumes of V in Ni, and misfit volumes of Ni2V random alloys

ColabFit ID: vanadium_in_high_entropy_alloys_AM2020__Yin-Maresca-Curtin__DS_4mjnowmrcqib_0
Name: vanadium_in_high_entropy_alloys_AM2020
Authors: Binglun Yin, Francesco Maresca, W. A. Curtin
Elements: Ni, V
Number of Configurations: 261
Number of Elements: 2
Number of Atoms: 22,196

Links:
https://doi.org/10.24435/materialscloud:2020.0020/v1
http://doi.org/10.1016/j.actamat.2020.01.062
water_and_Cu+_synergy_in_selective_CO2_hydrogenation_to_methanol_over_Cu/MgO_catalysts
Dataset Downloads Coming Soon Description: This dataset was created to investigate the role of surface water and hydroxyl groups in facilitating spontaneous CO₂ activation at Cu⁺ sites and the formation of monodentate formate species in the context of using CO2 hydrogenation to produce methanol.

ColabFit ID: water_and_Cu+_synergy_in_selective_CO2_hydrogenation_to_methanol_over_Cu/MgO_catalysts__Villanueva-Lustemberg-Zhao-Soriano-Concepción-Pirovano__DS_kl12pfupgv5e_0
Name: water_and_Cu+_synergy_in_selective_CO2_hydrogenation_to_methanol_over_Cu/MgO_catalysts
Authors: Estefanía Fernández Villanueva, Pablo Germán Lustemberg, Minjie Zhao, Jose Soriano, Patricia Concepción, María Verónica Ganduglia Pirovano
Elements: C, Cu, H, Mg, O
Number of Configurations: 14,962
Number of Elements: 5
Number of Atoms: 1,043,709

Links:
https://doi.org/10.24435/materialscloud:tz-pn
https://doi.org/10.1021/jacs.3c10685
water_ice_JCP_2020
Dataset Downloads Coming Soon Description: Starting from a single reference ab initio simulation, we use active learning to expand into new state points and to describe the quantum nature of the nuclei. The final model, trained on 814 reference calculations, yields excellent results under a range of conditions, from liquid water at ambient and elevated temperatures and pressures to different phases of ice, and the air-water interface — all including nuclear quantum effects.

ColabFit ID: water_ice_JCP_2020__Schran-Brezina-Marsalek__DS_p2aaxa2vfnr6_0
Name: water_ice_JCP_2020
Authors: Christoph Schran, Kyrstof Brezina, Ondrej Marsalek
Elements: H, O
Number of Configurations: 8,000
Number of Elements: 2
Number of Atoms: 2,088,000

Links:
https://doi.org/10.5281/zenodo.4004590
https://doi.org/10.1063/5.0016004
water_ice_NEP_2023
Dataset Downloads Coming Soon Description: The main part of the dataset consists of structures of liquid water at 300 K from first-principles molecular dynamics (FPMD) simulations using a hybrid density functional with dispersion corrections. The dataset is expanded to include nuclear quantum effects by adding structures from path-integral molecular dynamics (PIMD) simulations. The final dataset contains 814 structures of liquid water at different temperatures and pressures, water slab, and ice Ih and ice VIII. These systems cover a wide range of structural and dynamical properties of water and ice. This dataset builds on the dataset from Schran, et al (2020) https://doi.org/10.1063/5.0016004

ColabFit ID: water_ice_NEP_2023__Chen-Berrens-Chan-Fan-Donadio__DS_wm2xs5bhyzcx_0
Name: water_ice_NEP_2023
Authors: Zekun Chen, Margaret L. Berrens, Kam-Tung Chan, Zheyong Fan, Davide Donadio
Elements: H, O
Number of Configurations: 814
Number of Elements: 2
Number of Atoms: 216,144

Links:
https://github.com/ZKC19940412/water_ice_nep
https://doi.org/10.26434/chemrxiv-2023-sr496
water_ice_PNAS_2021
Dataset Downloads Coming Soon Description: Dataset generated using a committee-based active learning strategy to build a training dataset for modeling complex aqueous systems.

ColabFit ID: water_ice_PNAS_2021__Schran-Thiemann-Rowe-Müller-Marsalek-Michaelides__DS_mz0vl4zs7bex_0
Name: water_ice_PNAS_2021
Authors: Christoph Schran, Fabian L. Thiemann, Patrick Rowe, Erich A. Müller, Ondrej Marsalek, Angelos Michaelides
Elements: B, C, F, H, Mo, N, O, S, Ti
Number of Configurations: 1,786
Number of Elements: 9
Number of Atoms: 681,912

Links:
https://doi.org/10.5281/zenodo.5235246
https://doi.org/10.1073/pnas.2110077118
xxMD-CASSCF_test
Dataset Downloads Coming Soon Description: Test dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.

ColabFit ID: xxMD-CASSCF_test__Pengmei-Shu-Liu__DS_sytg0385f79a_0
Name: xxMD-CASSCF_test
Authors: Zihan Pengmei, Yinan Shu, Junyu Liu
Elements: C, H, N, O, S
Number of Configurations: 65,100
Number of Elements: 5
Number of Atoms: 707,552

Links:
https://github.com/zpengmei/xxMD
https://doi.org/10.48550/arXiv.2308.11155
xxMD-CASSCF_train
Dataset Downloads Coming Soon Description: Training dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.

ColabFit ID: xxMD-CASSCF_train__Pengmei-Shu-Liu__DS_qvljl3vtdp0f_0
Name: xxMD-CASSCF_train
Authors: Zihan Pengmei, Yinan Shu, Junyu Liu
Elements: C, H, N, O, S
Number of Configurations: 130,179
Number of Elements: 5
Number of Atoms: 1,411,656

Links:
https://github.com/zpengmei/xxMD
https://doi.org/10.48550/arXiv.2308.11155
xxMD-CASSCF_validation
Dataset Downloads Coming Soon Description: Validation dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.

ColabFit ID: xxMD-CASSCF_validation__Pengmei-Shu-Liu__DS_e0nl35x6dl85_0
Name: xxMD-CASSCF_validation
Authors: Zihan Pengmei, Yinan Shu, Junyu Liu
Elements: C, H, N, O, S
Number of Configurations: 64,848
Number of Elements: 5
Number of Atoms: 701,516

Links:
https://github.com/zpengmei/xxMD
https://doi.org/10.48550/arXiv.2308.11155
xxMD-DFT_test
Dataset Downloads Coming Soon Description: Test dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.

ColabFit ID: xxMD-DFT_test__Pengmei-Shu-Liu__DS_q007ninsdjvf_0
Name: xxMD-DFT_test
Authors: Zihan Pengmei, Yinan Shu, Junyu Liu
Elements: C, H, N, O, S
Number of Configurations: 21,661
Number of Elements: 5
Number of Atoms: 402,856

Links:
https://github.com/zpengmei/xxMD
https://doi.org/10.48550/arXiv.2308.11155
xxMD-DFT_train
Dataset Downloads Coming Soon Description: Training dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.

ColabFit ID: xxMD-DFT_train__Pengmei-Shu-Liu__DS_fblez5twbr07_0
Name: xxMD-DFT_train
Authors: Zihan Pengmei, Yinan Shu, Junyu Liu
Elements: C, H, N, O, S
Number of Configurations: 43,395
Number of Elements: 5
Number of Atoms: 807,416

Links:
https://github.com/zpengmei/xxMD
https://doi.org/10.48550/arXiv.2308.11155
xxMD-DFT_validation
Dataset Downloads Coming Soon Description: Validation dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.

ColabFit ID: xxMD-DFT_validation__Pengmei-Shu-Liu__DS_d1nmrzy4csx1_0
Name: xxMD-DFT_validation
Authors: Zihan Pengmei, Yinan Shu, Junyu Liu
Elements: C, H, N, O, S
Number of Configurations: 21,606
Number of Elements: 5
Number of Atoms: 402,151

Links:
https://github.com/zpengmei/xxMD
https://doi.org/10.48550/arXiv.2308.11155