Dataset

PropMolFlow_QM9_CNOFH_2025




Species content of dataset


Dataset viewer powered by Hugging Face

Name :
PropMolFlow_QM9_CNOFH_2025
ColabFit ID :
Description :
This DFT dataset is curated in response to the growing interest in property-guided molecule genaration using generative AI models. Typically, the properties of generated molecules are evaluated using machine learning (ML) property predictors trained on fully relaxed dataset. However, since generated molecules may deviate significantly from relaxed structures, these predictors can be highly unreliable for assessing their quality. This data provides DFT-evaluated properties, energy and forces for generated molecules. These structures are unrelaxed and can serve as a validation set for machine learning property predictors used in conditional molecule generation. It includes 10,773 molecules generated using PropMolFlow, a state-of-the-art conditional molecule generation model. PropMolFlow employs a flow matching process parameterized with an SE(3)-equivariant graph neural network. PropMolFlow models are trained on QM9 dataset. Molecules are generated by conditioning on six properties---polarizibility, gap, HOMO, LUMO, dipole moment and heat capacity at room temperature 298K---across two tasks: in-distribution and out-of-distribution generation. Full details are available in the corresponding paper.
Authors :
Cheng Zeng, Jirui Jin, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Mingjie Liu
DOI :
10.60732/1f7cae3c https://commons.datacite.org/doi.org/10.60732/1f7cae3c https://doi.datacite.org/dois/10.60732%2F1f7cae3c https://doi.org/10.60732/1f7cae3c Cite as: Zeng, C., Jin, J., Karypis, G., Transtrum, M., Tadmor, E. B., Hennig, R. G., Roitberg, A., Martiniani, S., and Liu, M. "PropMolFlow QM9 CNOFH 2025." ColabFit, 2025. https://doi.org/10.60732/1f7cae3c.
For other citation formats, see the DataCite Fabrica page for this dataset.
Num. Configurations :
10,773
Num. Atoms :
205,304
Downloads :
323
Calculated Property Types :
atomic_forces energy
Elements :
C (34.87%) F (0.18%) H (54.17%) N (4.48%) O (6.3%)
Methods :
DFT-B3LYP
Software :
Gaussian 16
Configuration Sets by Name :
Configuration Sets by ID :

No uploaded content is transferred in ownership from the original creators to ColabFit. All content is distributed under the license specified by its contributor who has stated that he or she has the authority to share it under the specified license.