DFT-LR-Bayesian-Regression-Python3

A bayesian regression using molecular dataset based on the (F. A. Faber et al., J. Chem. Theory Comput 13, 5255-5264 2017.) literature

SupplementaryMaterials dataset is required for this program to run. The link I used was recently made unavailable.

*** Possible to find dataset (qm9) from this article: Ramakrishnan, Raghunathan; Dral, Pavlo; Dral, Pavlo O.; Rupp, Matthias; Anatole von Lilienfeld, O. (2017): Quantum chemistry structures and properties of 134 kilo molecules. figshare. Collection ***

Background

Bayesian Regression model is essentially a lienar model with a L2 penalty on the coefficients. Unlike ridge regression where the key to that penalty is a regularization hyperparameter which must be set, in BR the optimal regularizer is estimated from the data. In this case our CM vector or Coulomb Matrix is a 2D matrix. Each row or col represents an atom. The composition of the matrix is relatively simple, the Coulomb classic mutual exclusion between every two atoms. So then we are left with:

M_i,j = 0.5*Z^(2.4) for I = J Z_I * Z_J / abs(R_I - M_J) for I != J

Here off-diagonal elements correspond to the Coulomb repulsion between atoms I and J, while diagonal elements encode a polynomial fit of atomic energies to nuclear charge. (M. Rupp et al., Phys. Rev. Lett. 108, 058301, 2012.)

I used SK_learn algorithms for defining training and test sets as well as performing the Bayesian Ridge Regression on the CM vector. Code is commented generally in order establish where/how I split the data and apply skleearn and numpy libraries.

Results

My Bayesian Regression data was run for all 13 properties of the CM feature vector generated from feature.py file. My results compare my results to the Faber et al. paper where my inspiration came from to run this program. The error of my results does not exceed 3% which is very good, and most likely means I ran my program correctly. The CM-BR.txt file included in this repository is available for analysis, as well as a picture ResultsBayesRegression for easy understanding of waht is being calculated in the .txt file.

Overall, my Master's degree research involves DFT so this was a fun side project to run and see how close regression techniques could come to accurately determine DFT calculations done using traditional DFT methods.

Thanks to Yu Chieh Chi for the collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
BayeRegressionDFT.py		BayeRegressionDFT.py
CM-BR.txt		CM-BR.txt
README.md		README.md
ResultsBayesRegression.png		ResultsBayesRegression.png
acs.jctc.7b00577.pdf		acs.jctc.7b00577.pdf
featureCM.py		featureCM.py
fitBayesRegression.py		fitBayesRegression.py
folding.py		folding.py
prop.py		prop.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DFT-LR-Bayesian-Regression-Python3

SupplementaryMaterials dataset is required for this program to run. The link I used was recently made unavailable.

Background

Results

About

Releases

Packages

Languages

techieslayj/DFT-LR-Bayesian-Regression-Python3

Folders and files

Latest commit

History

Repository files navigation

DFT-LR-Bayesian-Regression-Python3

SupplementaryMaterials dataset is required for this program to run. The link I used was recently made unavailable.

Background

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages