| The ColabFit Exchange: Data for Advanced Materials Science | 
Spotlight at NeurIPS AI4Mat Workshop 2024
ColabFit Paper
The ColabFit Exchange is a KIM Initiative project. It is the largest database of its kind, providing open access to a large collection of heterogeneous but systematically organized datasets that are especially designed for data-driven interatomic potential (DDIP) development.
Each curated dataset is represented as data points with selected training features and targets, and can be individually downloaded as a set of tarred parquet files (see the documentation here for guidance on using the parquet files). Datasets are additionally exported as extended XYZ files, which can be read by, for example, the ASE Python library.
Datasets can also be accessed on the ColabFit organization page on HuggingFace, via the ColabFit CLI, or through the KLIFF DDIP fitting package.
The ColabFit MCP (Model Context Protocol) is also available to allow AI/LLM interactions with the ColabFit Exchange.  
The ColabFit Exchange, along with the other KIM Initiative projects, addresses a pressing need in the molecular simulation community—ensuring science is findable, accessible, interoperable, and reproducible.