Parquet File Schema

Configuration


Configurations contain a structure and associated calculated property data. These function as the data points in a dataset.

Other file schemas


The following table provides a description of the columns found in untarred dataset parquet downloads: <dataset-id>/co/*.parquet.

Key Explanation Column Type
property_id Truncated property_hash prepended by 'PO_'. string
property_hash Hash over calculated property fields. string
last_modified Date when this object was last modified in the database. timestamp
dataset_id Unique id of the dataset to which this structure and set of calculated properties belongs. string
multiplicity Number of occurences of this structure and set of calculated properties in the dataset. integer
software Software used for generating calculated properties. string
method Level of theory/method used when generating calculated properties (i.e. DFT-PBE). string
energy Calculated energy, where selected energy is conjugate with atomic forces. double
atomic_forces Forces as implemented in i.e. VASP software: the first derivative of the total energy. array<array<double>>
cauchy_stress Size 9 (3x3) stress tensor. array<array<double>>
cauchy_stress_volume_normalized Whether stress has been normalized by cell volume. boolean
electronic_band_gap undefined double
electronic_band_gap_type If known, type is direct or indirect. string
formation_energy undefined double
adsorption_energy undefined double
atomization_energy undefined double
max_force_norm The maximum norm of atomic forces. double
mean_force_norm The mean norm of atomic forces. double
energy_above_hull undefined double
configuration_id Truncated configuration_hash prepended by 'CO_'. string
configuration_hash Hash over fields related to structure. string
structure_hash Hash over subset of fields related to structure, intended to allow comparison without reference to dataset-specific information. string
cell 3 x 3 array representing the unit cell array<array<double>>
positions N x 3 array representing the Cartesian coordinates of the N atoms in a system
array<array<double>>Nested arrays of doubles
pbc Periodic boundary conditions array<boolean>
chemical_formula_hill Chemical formula of structure in Hill format string
chemical_formula_reduced Chemical formula of structure in empirical format string
chemical_formula_anonymous Chemical formula in Hill format with chemical symbols anonymized to A, B, C...Aa, Ab, Ac... string
elements Elemental symbols of distinct atomic species in a structure.
array<string>Array of strings
elements_ratios Ratio of each atomic species in a structure, given in the same order as elements.
array<double>Array of doubles
atomic_numbers Atomic numbers of each atom in a structure.
array<integer>Array of integers
nsites Count of atoms in a structure. integer
nelements Count of distinct atomic species in a structure. integer
nperiodic_dimensions Count of dimensions for which PBC is true. integer
dimension_types undefined
array<integer>Array of integers
names ColabFit-internal name of a structure, allowing sorting and selection.
array<string>Array of strings
labels Tags may be used to describe characteristics of a structure.
array<string>Array of strings
property_metadata_path ColabFit-internal path to file containing property-related metadata. string
configuration_metadata_path ColabFit-internal path to file containing structure-related metadata. string

Fields included in property hash

adsorption_energy
atomic_forces
atomization_energy
cauchy_stress
cauchy_stress_volume_normalized
chemical_formula_hill
configuration_id
dataset_id
electronic_band_gap
electronic_band_gap_type
energy
energy_above_hull
formation_energy
metadata_id
method
software

Fields included in structure hash

atomic numbers
cell
pbc
positions

Fields included in configuration hash

Same as fields included in structure hash above, with the addition of a ColabFit-internal ID representing configuration metadata.