Apprendre Tanimoto Similarity and Molecular Comparison | Similarity, Clustering and Drug Discovery

Python for Chemoinformatics

Glissez pour afficher le menu

Understanding how similar two molecules are can be a powerful guide in drug discovery. Molecular similarity helps you find compounds with related activity, predict properties, and suggest new drug candidates by comparing them to known molecules. This approach is central to tasks like lead optimization, virtual screening, and clustering compound libraries to identify promising chemical series.

Definition

The Tanimoto coefficient is a measure of similarity between two sets, often used to compare molecular fingerprints. It is calculated as the size of the intersection divided by the size of the union of the sets. In the context of fingerprints, it reflects the proportion of shared features between two molecules. A value of 1.0 means the molecules are identical by the fingerprint used, while 0.0 means they share no common features.


              12345678910111213141516171819
            
from rdkit import Chem
from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator

# Define molecules
smiles1 = "CCO"    # Ethanol
smiles2 = "CCCO"   # Propanol

mol1 = Chem.MolFromSmiles(smiles1)
mol2 = Chem.MolFromSmiles(smiles2)

# Create Morgan fingerprint generator
morgan_gen = GetMorganGenerator(radius=2, fpSize=1024)

# Generate fingerprints
fp1 = morgan_gen.GetFingerprint(mol1)
fp2 = morgan_gen.GetFingerprint(mol2)

print("Fingerprint 1 (Ethanol):", list(fp1.GetOnBits())[:10])
print("Fingerprint 2 (Propanol):", list(fp2.GetOnBits())[:10])

To compare molecules computationally, fingerprints are used as compact representations of their structure. After generating fingerprints, you need a way to compare them quantitatively. The Tanimoto similarity coefficient is the most widely used metric for this purpose. It measures the overlap between two fingerprint bit vectors, giving a value between 0 and 1, where 1 means the fingerprints are identical and 0 means they are completely different.


              12345
            
# Calculate Tanimoto similarity between two molecules using RDKit
from rdkit.DataStructs import FingerprintSimilarity

similarity = FingerprintSimilarity(fp1, fp2)
print("Tanimoto similarity between ethanol and propanol:", similarity)

Similarity searching is a practical tool in chemoinformatics. By measuring how similar a new compound is to known drugs or active molecules, you can:

Prioritize which compounds to test;
Group compounds into clusters;
Predict biological activity.

High similarity to a known active compound may suggest similar function, while lower similarity can indicate novel chemical space worth exploring.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 1

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 2. Chapitre 1

Tanimoto Similarity and Molecular Comparison

1. What does a Tanimoto similarity of 1.0 mean?

2. Which type of data is required to compute Tanimoto similarity?

3. Why is molecular similarity important in drug discovery?