Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Molecular Clustering | Similarity, Clustering and Drug Discovery
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Chemoinformatics

bookMolecular Clustering

Clustering is a powerful approach in chemoinformatics that allows you to group molecules based on their similarity. This process is essential when analyzing large libraries of compounds, as it helps you identify patterns, reduce redundancy, and select representative molecules for further study. In drug discovery, clustering can be used to organize chemical space, prioritize compounds for screening, and ensure diversity in a chemical library.

Note
Definition

A similarity matrix is a table that shows the pairwise similarity scores between molecules, typically calculated with metrics like Tanimoto similarity on fingerprints.
Clustering is the process of grouping molecules so that those within the same group (cluster) are more similar to each other than to those in other groups.

1234567891011121314151617181920212223242526272829303132333435363738394041
from rdkit import Chem, DataStructs from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Input SMILES smiles_list = [ "CCO", # ethanol "CCCO", # 1-propanol "CCCCO", # 1-butanol "CCN", # ethylamine "CC(=O)O" # acetic acid ] # Convert SMILES to Mol objects mols = [] for smi in smiles_list: mol = Chem.MolFromSmiles(smi) mols.append(mol) # Create Morgan fingerprint generator gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fps = [] for mol in mols: fp = gen.GetFingerprint(mol) fps.append(fp) # Compute similarity matrix n = len(fps) similarity_matrix = [] for i in range(n): row = [] for j in range(n): similarity = DataStructs.TanimotoSimilarity(fps[i], fps[j]) row.append(similarity) similarity_matrix.append(row) # Print matrix for row in similarity_matrix: print(row)
copy

Clustering relies on the idea that molecules with similar properties or structures will have higher similarity scores. The similarity matrix you just computed shows pairwise similarity values between all molecules in your set. Clustering algorithms use this matrix to group molecules into clusters, where each cluster contains compounds that are more similar to each other than to those in other clusters. This process is often the first step before selecting representative compounds for further analysis or experimental testing.

12345678910111213141516171819
# Simple clustering: Group molecules with similarity above a threshold threshold = 0.7 clusters = [] assigned = set() for i in range(n): if i in assigned: continue cluster = [i] assigned.add(i) for j in range(i + 1, n): if similarity_matrix[i][j] >= threshold: cluster.append(j) assigned.add(j) clusters.append(cluster) # Print clusters with molecule indices and SMILES for idx, cluster in enumerate(clusters): print(f"Cluster {idx + 1}: {[smiles_list[i] for i in cluster]}")
copy

By clustering molecules based on their similarity, you can quickly spot groups of redundant compounds—those that are very similar to each other. This helps you avoid screening or analyzing nearly identical molecules, saving both time and resources. At the same time, clustering highlights diverse representatives from your library, which is crucial for exploring new chemical space and increasing the chances of finding novel active compounds.

1. What is the purpose of clustering molecules?

2. What is a potential application of molecular clustering?

question mark

What is the purpose of clustering molecules?

Select all correct answers

question mark

What is a potential application of molecular clustering?

Select all correct answers

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 3

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you explain how the similarity threshold affects the clustering results?

What are some common clustering algorithms used in chemoinformatics?

How can I visualize the clusters or similarity matrix?

bookMolecular Clustering

Svep för att visa menyn

Clustering is a powerful approach in chemoinformatics that allows you to group molecules based on their similarity. This process is essential when analyzing large libraries of compounds, as it helps you identify patterns, reduce redundancy, and select representative molecules for further study. In drug discovery, clustering can be used to organize chemical space, prioritize compounds for screening, and ensure diversity in a chemical library.

Note
Definition

A similarity matrix is a table that shows the pairwise similarity scores between molecules, typically calculated with metrics like Tanimoto similarity on fingerprints.
Clustering is the process of grouping molecules so that those within the same group (cluster) are more similar to each other than to those in other groups.

1234567891011121314151617181920212223242526272829303132333435363738394041
from rdkit import Chem, DataStructs from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Input SMILES smiles_list = [ "CCO", # ethanol "CCCO", # 1-propanol "CCCCO", # 1-butanol "CCN", # ethylamine "CC(=O)O" # acetic acid ] # Convert SMILES to Mol objects mols = [] for smi in smiles_list: mol = Chem.MolFromSmiles(smi) mols.append(mol) # Create Morgan fingerprint generator gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fps = [] for mol in mols: fp = gen.GetFingerprint(mol) fps.append(fp) # Compute similarity matrix n = len(fps) similarity_matrix = [] for i in range(n): row = [] for j in range(n): similarity = DataStructs.TanimotoSimilarity(fps[i], fps[j]) row.append(similarity) similarity_matrix.append(row) # Print matrix for row in similarity_matrix: print(row)
copy

Clustering relies on the idea that molecules with similar properties or structures will have higher similarity scores. The similarity matrix you just computed shows pairwise similarity values between all molecules in your set. Clustering algorithms use this matrix to group molecules into clusters, where each cluster contains compounds that are more similar to each other than to those in other clusters. This process is often the first step before selecting representative compounds for further analysis or experimental testing.

12345678910111213141516171819
# Simple clustering: Group molecules with similarity above a threshold threshold = 0.7 clusters = [] assigned = set() for i in range(n): if i in assigned: continue cluster = [i] assigned.add(i) for j in range(i + 1, n): if similarity_matrix[i][j] >= threshold: cluster.append(j) assigned.add(j) clusters.append(cluster) # Print clusters with molecule indices and SMILES for idx, cluster in enumerate(clusters): print(f"Cluster {idx + 1}: {[smiles_list[i] for i in cluster]}")
copy

By clustering molecules based on their similarity, you can quickly spot groups of redundant compounds—those that are very similar to each other. This helps you avoid screening or analyzing nearly identical molecules, saving both time and resources. At the same time, clustering highlights diverse representatives from your library, which is crucial for exploring new chemical space and increasing the chances of finding novel active compounds.

1. What is the purpose of clustering molecules?

2. What is a potential application of molecular clustering?

question mark

What is the purpose of clustering molecules?

Select all correct answers

question mark

What is a potential application of molecular clustering?

Select all correct answers

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 3
some-alt