QSAR & Data Mining
This section of the course deals with a number of methods that use statistics to identify correlations between some observed property (usually a biological activity) and a set of molecular properties. Common applications include defining a equation that will reproduce the measured activity for compounds within a particular data set, and then using this to search through a chemical library (which could have been constructed for a completely different purpose) to identify new, active, compounds.
General Discussions
- A historical perspective on QSAR (here)
- A good general review (here)
- errors associated with high throughput screening data (here)
Descriptors
- Use of PCA to identify descriptors for the amino acids (here)
- Use of Genetic Algorithms to identify descriptor sets, applied to the prediction of anitfungal activity (here)
papers used in discussing QSAR methods
- Linear regression methods, applied to passage of drugs into breast milk (here)
- Neural Networks, applied to antifungal activity (here)
- 3D QSAR: development of a pharmocaphore model to identify transporters for various drugs (here)