Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Tanimoto Similarity and Molecular Comparison | Similarity, Clustering and Drug Discovery
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Chemoinformatics

bookTanimoto Similarity and Molecular Comparison

Understanding how similar two molecules are can be a powerful guide in drug discovery. Molecular similarity helps you find compounds with related activity, predict properties, and suggest new drug candidates by comparing them to known molecules. This approach is central to tasks like lead optimization, virtual screening, and clustering compound libraries to identify promising chemical series.

Note
Definition

The Tanimoto coefficient is a measure of similarity between two sets, often used to compare molecular fingerprints. It is calculated as the size of the intersection divided by the size of the union of the sets. In the context of fingerprints, it reflects the proportion of shared features between two molecules. A value of 1.0 means the molecules are identical by the fingerprint used, while 0.0 means they share no common features.

12345678910111213141516171819
from rdkit import Chem from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Define molecules smiles1 = "CCO" # Ethanol smiles2 = "CCCO" # Propanol mol1 = Chem.MolFromSmiles(smiles1) mol2 = Chem.MolFromSmiles(smiles2) # Create Morgan fingerprint generator morgan_gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fp1 = morgan_gen.GetFingerprint(mol1) fp2 = morgan_gen.GetFingerprint(mol2) print("Fingerprint 1 (Ethanol):", list(fp1.GetOnBits())[:10]) print("Fingerprint 2 (Propanol):", list(fp2.GetOnBits())[:10])
copy

To compare molecules computationally, fingerprints are used as compact representations of their structure. After generating fingerprints, you need a way to compare them quantitatively. The Tanimoto similarity coefficient is the most widely used metric for this purpose. It measures the overlap between two fingerprint bit vectors, giving a value between 0 and 1, where 1 means the fingerprints are identical and 0 means they are completely different.

12345
# Calculate Tanimoto similarity between two molecules using RDKit from rdkit.DataStructs import FingerprintSimilarity similarity = FingerprintSimilarity(fp1, fp2) print("Tanimoto similarity between ethanol and propanol:", similarity)
copy

Similarity searching is a practical tool in chemoinformatics. By measuring how similar a new compound is to known drugs or active molecules, you can:

  • Prioritize which compounds to test;
  • Group compounds into clusters;
  • Predict biological activity.

High similarity to a known active compound may suggest similar function, while lower similarity can indicate novel chemical space worth exploring.

1. What does a Tanimoto similarity of 1.0 mean?

2. Which type of data is required to compute Tanimoto similarity?

3. Why is molecular similarity important in drug discovery?

question mark

What does a Tanimoto similarity of 1.0 mean?

Select the correct answer

question mark

Which type of data is required to compute Tanimoto similarity?

Select all correct answers

question mark

Why is molecular similarity important in drug discovery?

Select all correct answers

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 1

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

bookTanimoto Similarity and Molecular Comparison

Glissez pour afficher le menu

Understanding how similar two molecules are can be a powerful guide in drug discovery. Molecular similarity helps you find compounds with related activity, predict properties, and suggest new drug candidates by comparing them to known molecules. This approach is central to tasks like lead optimization, virtual screening, and clustering compound libraries to identify promising chemical series.

Note
Definition

The Tanimoto coefficient is a measure of similarity between two sets, often used to compare molecular fingerprints. It is calculated as the size of the intersection divided by the size of the union of the sets. In the context of fingerprints, it reflects the proportion of shared features between two molecules. A value of 1.0 means the molecules are identical by the fingerprint used, while 0.0 means they share no common features.

12345678910111213141516171819
from rdkit import Chem from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Define molecules smiles1 = "CCO" # Ethanol smiles2 = "CCCO" # Propanol mol1 = Chem.MolFromSmiles(smiles1) mol2 = Chem.MolFromSmiles(smiles2) # Create Morgan fingerprint generator morgan_gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fp1 = morgan_gen.GetFingerprint(mol1) fp2 = morgan_gen.GetFingerprint(mol2) print("Fingerprint 1 (Ethanol):", list(fp1.GetOnBits())[:10]) print("Fingerprint 2 (Propanol):", list(fp2.GetOnBits())[:10])
copy

To compare molecules computationally, fingerprints are used as compact representations of their structure. After generating fingerprints, you need a way to compare them quantitatively. The Tanimoto similarity coefficient is the most widely used metric for this purpose. It measures the overlap between two fingerprint bit vectors, giving a value between 0 and 1, where 1 means the fingerprints are identical and 0 means they are completely different.

12345
# Calculate Tanimoto similarity between two molecules using RDKit from rdkit.DataStructs import FingerprintSimilarity similarity = FingerprintSimilarity(fp1, fp2) print("Tanimoto similarity between ethanol and propanol:", similarity)
copy

Similarity searching is a practical tool in chemoinformatics. By measuring how similar a new compound is to known drugs or active molecules, you can:

  • Prioritize which compounds to test;
  • Group compounds into clusters;
  • Predict biological activity.

High similarity to a known active compound may suggest similar function, while lower similarity can indicate novel chemical space worth exploring.

1. What does a Tanimoto similarity of 1.0 mean?

2. Which type of data is required to compute Tanimoto similarity?

3. Why is molecular similarity important in drug discovery?

question mark

What does a Tanimoto similarity of 1.0 mean?

Select the correct answer

question mark

Which type of data is required to compute Tanimoto similarity?

Select all correct answers

question mark

Why is molecular similarity important in drug discovery?

Select all correct answers

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 1
some-alt