Thread by @SHolderbach, Time to end the week with a thread on our recent @ChemRxiv [...]

Time to end the week with a thread on our recent @ChemRxiv preprint from @HITStudies:
https://doi.org/10.26434/chemrxiv.12636704.v1
1/
If">https://doi.org/10.26434/... you want fast protein-ligand binding predictions, without using full docking, can we get away with simple, naive information and some machine learning magic?

RASPD+: Fast Protein-Ligand Binding Free Energy Prediction Using Simplified Physicochemical Features

The virtual screening of large numbers of compounds against target protein binding sites has become an integral component of drug discovery workflows. This screening is often done by computationally...

https://doi.org/10.26434/chemrxiv.12636704.v1

2/
General methods that predict for a chosen protein if a drug binds to it need to take into account information from both ligand and protein to model the physicochemical interactions.
But to model interactions we typically need to know their arrangement to each other - the pose.

3/
In docking this pose has to be sampled by running several steps of a function that assesses the likelihood of binding or interaction - the scoring function.
Thus the computational cost for getting a rough binding energy prediction for many molecules quickly gets quite high.

4/
So if you want to quickly get a good guess on not just a few hundred or thousand compounds but millions, we might want to simplify that.
If we can make an educated guess about the binding location in the protein we could ignore poses and hope to have enough info to predict!

5/
In our method RASPD+ we take just the most important features that contribute to interactions (e.g. H-bond donor/acceptors, logP, the molar refractivity) from the ligand and in a sphere around the binding site of the protein (thus pose invariant) and train ML models.

$5/In our method RASPD+ we take just the most important features that contribute to interactions (e.g. H-bond donor/acceptors, logP, the molar refractivity) from the ligand and in a sphere around the binding site of the protein (thus pose invariant) and train ML models.$

6/
This was conceptually already demonstrated with simple linear regression by our coauthors Goutam Mukherjee and B. Jayaram in their inital RASPD approach:
https://doi.org/10.1039/C3CP44697B
While">https://doi.org/10.1039/C... linear models capture general trends, a lot of information gets lost for accurate prediction.

A rapid identification of hit molecules for target proteins via physico-chemical descriptors

We report here a novel computationally fast protocol (RASPD) for identifying good candidates for any target protein from any molecule/million molecule database. A QSAR-type equation sets up the...

https://doi.org/10.1039/C3CP44697B

7/
As the PDBbind dataset w/ bound protein structures and associated binding data is quite small we tried several different machine learning methods on our 6 ligand and 14 protein features, found that random forests performed best for regression and evaluated feature importance.

8/
While the random forests outperformed the simple linear models when the goal was to predict binding free energy for known bound structures.
Yet when the goal was to identify binders from computationally generated non-binders this trend curiously changed.

9/
While the accuracy of RASPD+ doesn& #39;t generally surpass more elaborate methods (in some cases might not meet all requirements on your protein), it is by a factor of >100 faster than docking and quickly generates guesses for further evaluation and provides a strong baseline.

10/
If you want to try it for yourself and test it in your prefiltering pipeline for structure based drug discovery check it out on GitHub: https://github.com/HITS-MCM/RASPDplus">https://github.com/HITS-MCM/...

HITS-MCM/RASPDplus

Associated code and software for "RASPD+: Fast protein-ligand binding free energy prediction using simplified physicochemical features" - HITS-MCM/RASPDplus

https://github.com/HITS-MCM/RASPDplus

11/
RASPD+ wouldn& #39;t have been possible without Goutam Mukherjee and B. Jayaram, who started with RASPD at @iitdelhi, and Lukas Adam and @Rebecca_Wade_C from the MCM group @HITStudies where I had the pleasure to work on it.

Latest Threads Unrolled: