To understand a collection of atom types, organic and medicinal chemists adopted the IUPAC nomenclature to categorize functional groups. In this study a lexical dictionary bridge between IUPAC and CGenFF atom types is presented using SMILES/SMARTS common functional group patterns as the mid-level language translators implemented in Python.
A central component of any successful force field is the small molecules used to define the initial atom type engine. The CHARMM General Force Field (CGenFF) was created based on a selection criteria that consists of a wide range heterocycles and simple functional groups.
More recently a study, Rings in Drugs, was published and highlighted that each year 28% of new therapeutics contain a new novel ring system (2). So this percentage could be significantly higher if we included non-ring functional groups. This presents a problem: in a nearly infinite chemical space how do we select the most important functional groups to conduct time-consuming force field parameterization that maximizes our representation of molecules most likely to be considered in drug design?
CGenFF is unique in its ability to quantify the quality of the assigned charges and bonded parameters of a compound based on compounds in the FF and it’s decision tree. This penalty score allows for the distribution of well vs. poorly predicted compounds to be determined. Using the penalty values clusters in the charge probability distribution were identified and denoted “No”, “Low”, “Mid”, “High”. To visualize this classification scheme, we applied the sunburst to a variety of existing chemical databases that was of interest to us
To visualize what atoms and associated atom types had the highest penalties, thereby requiring parameter optimization,we correlated the atom language using a series of key value dictionaries to something readable by medicinal chemists, IUPAC.