Parquet File Schema
Configuration
Configurations contain a structure and associated calculated property data. These function as the data points in a dataset.
The following table provides a description of the columns found in untarred dataset parquet downloads.
Location: <dataset-directory>/co/*.parquet
Key (units) | Explanation | Column Type |
---|---|---|
property_id |
Truncated property_hash prepended by 'PO_'. |
string |
property_hash |
Hash over calculated property fields. | string |
last_modified |
Date when this object was last modified in the database. | timestamp |
dataset_id |
Unique id of the dataset to which this structure and set of calculated properties belongs. | string |
multiplicity |
Number of occurences of this structure and set of calculated properties in the dataset. | integer |
software |
Software used for generating calculated properties. | string |
method |
Level of theory/method used when generating calculated properties (i.e. DFT-PBE). | string |
energy (eV) |
Calculated energy, where selected energy is conjugate with atomic forces. | double |
atomic_forces (eV/Å) |
Forces as implemented in i.e. VASP software: the first derivative of the total energy. | array<array<double>> |
cauchy_stress (eV/Å3) |
Size 9 (3x3) stress tensor. | array<array<double>> |
cauchy_stress_volume_normalized |
Whether stress has been normalized by cell volume. | boolean |
electronic_band_gap (eV) |
Calculated electronic band gap of the system defined as the energy difference between the highest occupied electronic state and the lowest unoccupied electronic state. | double |
electronic_band_gap_type |
If known, type is direct or indirect . |
string |
formation_energy (eV) |
Calculated formation energy defined as the energy difference between the total system and its reference elemental structures. | double |
adsorption_energy (eV) |
Calculated adsorption energy defined as the energy difference between the total interacting surface + adsorbate and the clean surface + isolated adsorbate. | double |
atomization_energy (eV) |
Calculated atomization energy defined as the energy needed to break the system into its (mono)atomic constituents. | double |
max_force_norm |
The maximum norm of atomic forces. | double |
mean_force_norm |
The mean norm of atomic forces. | double |
energy_above_hull (eV) |
Calculated energy above hull. This is a measure of thermodynamic stability relative to a reference convex hull built from structures (of the same chemical system) and their energies. | double |
configuration_id |
Truncated configuration_hash prepended by 'CO_'. |
string |
configuration_hash |
Hash over fields related to structure. | string |
structure_hash |
Hash over subset of fields related to structure, intended to allow comparison without reference to dataset-specific information. | string |
cell |
3 x 3 array representing the unit cell | array<array<double>> |
positions |
N x 3 array representing the Cartesian coordinates of the N atoms in a system | Nested arrays of doubles |
pbc |
Periodic boundary conditions | array<boolean> |
chemical_formula_hill |
Chemical formula of structure in Hill format | string |
chemical_formula_reduced |
Chemical formula of structure in empirical format | string |
chemical_formula_anonymous |
Chemical formula in Hill format with chemical symbols anonymized to A, B, C...Aa, Ab, Ac... | string |
elements |
Elemental symbols of distinct atomic species in a structure. | Array of strings |
elements_ratios |
Ratio of each atomic species in a structure, given in the same order as elements . |
Array of doubles |
atomic_numbers |
Atomic numbers of each atom in a structure. | Array of integers |
nsites |
Count of atoms in a structure. | integer |
nelements |
Count of distinct atomic species in a structure. | integer |
nperiodic_dimensions |
An integer specifying the number of periodic dimensions in the structure, equivalent to the number of non-zero entries in dimension_types. | integer |
dimension_types |
List of three integers describing the periodicity of the boundaries of the unit cell. For each direction defined by the cell, this list indicates if the direction is periodic (value 1) or non-periodic (value 0). | Array of integers |
names |
ColabFit-internal name of a structure, allowing sorting and selection. | Array of strings |
labels |
Tags may be used to describe characteristics of a structure. | Array of strings |
property_metadata_path |
ColabFit-internal path to file containing property-related metadata. | string |
configuration_metadata_path |
ColabFit-internal path to file containing structure-related metadata. | string |
Fields included in property hash
adsorption_energy
atomic_forces
atomization_energy
cauchy_stress
cauchy_stress_volume_normalized
chemical_formula_hill
configuration_id
dataset_id
electronic_band_gap
electronic_band_gap_type
energy
energy_above_hull
formation_energy
metadata_id
method
software
Fields included in structure hash
atomic numbers
cell
pbc
positions
Fields included in configuration hash
Same as fields included in structure hash above, with the addition of a ColabFit-internal ID representing configuration metadata.