Material simulation is a vast research field that spans understanding of the optimal structure, physical and chemical properties of a given material. The foundation of the simulation is based on estimating the total energy with Density Functional Theory (DFT) calculations, which addresses many-body interaction calculations upon atoms. Despite the tremendous advances [1, 2], DFT methods are computationally heavy, as they have exponential complexity over the number of atoms. Hence, it has been a great challenge to reduce computational cost while improving the prediction accuracy. 

To compensate for this issue, recent machine learning approaches are developed to build a surrogate function to the classical force field methods that rely on physical principles and human intuition. Same with the DFT methods, the deep neural net must estimate the total energy of the given system, which are organic molecules in this research. Following the emergence of Message Passing Neural Networks [3], there were groundbreaking advances in energy estimation [4, 5]. This was further improved with the combination of attention based methods [6, 7]. Within small molecules such as the QM9 dataset [8], the most advanced models could guess the total energy of the molecule with the MAE of 6meV, less than a quarter of the thermal fluctuation at room temperature. 

In this research, Prof. Joonseok Lee’s team pointed out that these baselines are overly optimized only for energy estimation. The team injected small Gaussian noise to the position of atoms in a stable molecule, and optimized the structure based on the predicted energy to see whether the perturbed molecule recovers its stable state. However, all baselines fail to recover the original structure for even the simplest molecules. This indicates that the model does not understand the underlying physics of the given molecule. 

To deal with the problem, the team suggests a parametrized bond energy estimation, followed by a training scheme to embed basic physical rules which are particularly applicable when dealing with a stable state structure. The model is given a stable structure, and its slightly perturbed counterpart, and the team forces the model with a simple inequality bound condition on energy, and a zero force condition over simple molecules. Further, the study also suggests a simple masked modeling framework that is capable of capturing the entire structure of the molecule, extending its application towards chemical reactions and complex systems.

Seunghoon Yi, Youngwoo Cho, Jinhwan Sul, Seung Woo Ko, Soo Kyung Kim, Jaegul Choo, Hongkee Yoon, Joonseok Lee.

Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI), 2023.

References
  1. W.Kohn and L.J.Sham. Self-consistent equations including exchange and correlation effects. Phys.Rev., 140:A1133A1138, Nov.1965.
  2. R.G.Parr. Density functional theory of atoms and molecules. In Horizons of quantum chemistry, pages 5–15. Springer, 1980.
  3. J.Gilmer, S.S.Schoenholz, P.F.Riley, O.Vinyals, and G.E. Dahl. Neural message passing for quantum chemistry. In Proc. of the International Conference on Machine Learning (ICML), 2017.
  4. K.T.Schütt, H.E.Sauceda, P.-J.Kindermans, A.Tkatchenko, and K.-R.Müller. SchNet-a deep learning architecture for molecules and materials. J. Chem. Phys., 148:241722, June 2018. ISSN 0021-9606, 1089-7690.
  5. O.T.Unke and M.Meuwly. PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments and Partial Charges. J.Chem. Theory Comput., 15:3678–3693, June 2019.ISSN 1549-9618, 1549-9626.
  6. Y.Cho, H.Yoon, S.Yi, J.Choo, M.J.Han, J.Lee, and S.Kim. Deep-DFT : A physics-ml hybrid approach to predict molecular energy using transformer. In Proc. of the Advances in Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning and the Physical Sciences, 2021.
  7. P. Thölke and G. D. Fabritiis. Equivariant transformers for neural network based molecular potentials. InProc. of the International Conference on Learning Representations (ICLR), 2022.
  8. L. Ruddigkeit, R. Van Deursen, L. C. Blum, and J. L. Reymond. Enumeration Of 166 billion Organic Small molecules in chemical universe database GDB-17. Journal Of Chemical Information And Modeling, 52(11): 2864–2875, 2012.