The ColabFit Exchange: Data for Advanced Materials Science


Spotlight at NeurIPS AI4Mat Workshop 2024

December 2024:
E. G. Fuemmeler, G. P. Wolfe, A. Gupta, J. A. Vita, E. B. Tadmor, S. Martiniani, "Advancing the ColabFit Exchange towards a Web-scale Data Source for Machine Learning Interatomic Potentials", NeurIPS, (2024). https://openreview.net/forum?id=b8qZpGJIkw


ColabFit Paper

Published October 2023:
J. A. Vita, E. G. Fuemmeler, A. Gupta, G. P. Wolfe, A. Q. Tao, R. S. Elliott, S. Martiniani, E. B. Tadmor, "ColabFit exchange: Open-access datasets for data-driven interatomic potentials", Journal of Chemical Physics, 159, 154802 (2023). doi:10.1063/5.0163882


The ColabFit Exchange is a KIM Initiative project. It is the largest database of its kind, providing open access to a large collection of heterogeneous but systematically organized datasets that are especially designed for data-driven interatomic potential (DDIP) development.

Each curated dataset is represented as data points with selected training features and targets, and can be individually downloaded as a set of tarred parquet files (see the documentation here for guidance on using the parquet files). Datasets are additionally exported as extended XYZ files, which can be read by, for example, the ASE Python library.

Datasets can also be accessed on the ColabFit organization page on HuggingFace, via the ColabFit CLI, or through the KLIFF DDIP fitting package.

The ColabFit Exchange, along with the other KIM Initiative projects, addresses a pressing need in the molecular simulation community—ensuring science is findable, accessible, interoperable, and reproducible.