Parquet File Schema
Dataset
A dataset defines a collection of configurations, including creators, relevant links, a DOI and aggregated data
The following table provides a description of the columns found in untarred dataset parquet downloads.
Location: <dataset-id>/ds.parquet
.
Key | Explanation | Column Type |
---|---|---|
id |
Unique identifier for the dataset. | string |
hash |
Hash over dataset values. | string |
name |
Name of the dataset. | string |
last_modified |
Date when this dataset was last modified in the database. | timestamp |
software |
Software used for property calculations. | Array of strings |
methods |
Level(s) of theory used for property calculations. | Array of strings |
nconfigurations |
Count of configurations in the dataset. | integer |
nproperty_objects |
Count of property objects in the dataset. | long |
nsites |
Sum of atomic site counts across all configurations. | long |
nelements |
Count of distinct elements in the dataset. | integer |
elements |
Elemental symbols of distinct atomic species present in the dataset. | Array of strings |
total_elements_ratios |
Ratio of each atomic species across the entire dataset. Order matches elements . |
Array of doubles |
nperiodic_dimensions |
The set of integers indicating the number of periodic dimensions for all structures in the dataset, equivalent to the number of non-zero entries in dimension_types. | integer . |
dimension_types |
The set of arrays corresponding to dimension_types for all configurations contained in a dataset. |
Nested arrays of integers |
energy_count |
Count of configurations with energy calculations. | long |
energy_mean (eV) |
Mean energy value across configurations. | double |
energy_variance (eV) |
Variance of energy values across configurations. | double |
atomization_energy_count |
Count of configurations with atomization energy calculations. | long |
adsorption_energy_count |
Count of configurations with adsorption energy calculations. | long |
energy_above_hull_count |
Count of configurations with energy above hull calculations. | long |
formation_energy_count |
Count of configurations with formation energy calculations. | long |
atomic_forces_count |
Count of configurations with atomic forces calculations. | long |
electronic_band_gap_count |
Count of configurations with electronic band gap calculations. | long |
cauchy_stress_count |
Count of configurations with Cauchy stress calculations. | long |
authors |
List of authors who contributed to the dataset. | Array of strings |
description |
Description of the dataset. | string |
extended_id |
Extended identifier for the dataset. | string |
license |
License under which the dataset is distributed. | string |
links |
Links related to the dataset, including original publications and data repositories. | string |
publication_year |
Year the dataset was published to ColabFit. | string |
doi |
Digital Object Identifier for the dataset. | string |
equilibrium |
Whether the dataset contains only equilibrium structures. | boolean |
colabfit_publication_date |
Date when the dataset was published to ColabFit. | timestamp |
date_requested |
Date when the dataset was requested. If not requested, defaults to colabfit_publication_date | timestamp |