Section 2. Chapter 4
single
Challenge: Cluster a Compound Library
Swipe to show menu
Task
Swipe to start coding
Write a Python function using RDKit that takes a list of SMILES strings and groups them into clusters based on pairwise Tanimoto similarity. Each cluster should contain molecules where every member has a Tanimoto similarity above 0.6 with at least one other member in the cluster.
- Parse each SMILES string into an RDKit molecule.
- Generate Morgan fingerprints for each molecule.
- Compare fingerprints pairwise using Tanimoto similarity.
- Group molecules so that each cluster contains molecules with at least one similarity above 0.6 to another member.
- Return a list of clusters, where each cluster is a list of SMILES strings.
Solution
Everything was clear?
Thanks for your feedback!
Section 2. Chapter 4
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat