Parquet File Schema

Dataset


A dataset defines a collection of configurations, including creators, relevant links, a DOI and aggregated data

Other file schemas


The following table provides a description of the columns found in untarred dataset parquet downloads: <dataset-id>/ds.parquet.

Key Explanation Column Type
id Unique identifier for the dataset. string
hash Hash over dataset values. string
name Name of the dataset. string
last_modified Date when this dataset was last modified in the database. timestamp
software Software used for property calculations.
array<string>Array of strings
methods Level(s) of theory used for property calculations.
array<string>Array of strings
nconfigurations Count of configurations in the dataset. integer
nproperty_objects Count of property objects in the dataset. long
nsites Sum of atomic site counts across all configurations. long
nelements Count of distinct elements in the dataset. integer
elements Elemental symbols of distinct atomic species present in the dataset.
array<string>Array of strings
total_elements_ratios Ratio of each atomic species across the entire dataset. Order matches elements.
array<double>Array of doubles
nperiodic_dimensions undefined
array<integer>Array of integers
dimension_types undefined
array<array<integer>>Nested arrays of integers
energy_count Count of configurations with energy calculations. long
energy_mean Mean energy value across configurations. double
energy_variance Variance of energy values across configurations. double
atomization_energy_count Count of configurations with atomization energy calculations. long
adsorption_energy_count Count of configurations with adsorption energy calculations. long
energy_above_hull_count Count of configurations with energy above hull calculations. long
formation_energy_count Count of configurations with formation energy calculations. long
atomic_forces_count Count of configurations with atomic forces calculations. long
electronic_band_gap_count Count of configurations with electronic band gap calculations. long
cauchy_stress_count Count of configurations with Cauchy stress calculations. long
authors List of authors who contributed to the dataset.
array<string>Array of strings
description Description of the dataset. string
extended_id Extended identifier for the dataset. string
license License under which the dataset is distributed. string
links Links related to the dataset, including original publications and data repositories. string
publication_year Year the dataset was published to ColabFit. string
doi Digital Object Identifier for the dataset. string
equilibrium Whether the dataset contains only equilibrium structures. boolean
colabfit_publication_date Date when the dataset was published to ColabFit. timestamp