Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Tanimoto Similarity and Molecular Comparison | Similarity, Clustering and Drug Discovery
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Chemoinformatics

bookTanimoto Similarity and Molecular Comparison

Understanding how similar two molecules are can be a powerful guide in drug discovery. Molecular similarity helps you find compounds with related activity, predict properties, and suggest new drug candidates by comparing them to known molecules. This approach is central to tasks like lead optimization, virtual screening, and clustering compound libraries to identify promising chemical series.

Note
Definition

The Tanimoto coefficient is a measure of similarity between two sets, often used to compare molecular fingerprints. It is calculated as the size of the intersection divided by the size of the union of the sets. In the context of fingerprints, it reflects the proportion of shared features between two molecules. A value of 1.0 means the molecules are identical by the fingerprint used, while 0.0 means they share no common features.

12345678910111213141516171819
from rdkit import Chem from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Define molecules smiles1 = "CCO" # Ethanol smiles2 = "CCCO" # Propanol mol1 = Chem.MolFromSmiles(smiles1) mol2 = Chem.MolFromSmiles(smiles2) # Create Morgan fingerprint generator morgan_gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fp1 = morgan_gen.GetFingerprint(mol1) fp2 = morgan_gen.GetFingerprint(mol2) print("Fingerprint 1 (Ethanol):", list(fp1.GetOnBits())[:10]) print("Fingerprint 2 (Propanol):", list(fp2.GetOnBits())[:10])
copy

To compare molecules computationally, fingerprints are used as compact representations of their structure. After generating fingerprints, you need a way to compare them quantitatively. The Tanimoto similarity coefficient is the most widely used metric for this purpose. It measures the overlap between two fingerprint bit vectors, giving a value between 0 and 1, where 1 means the fingerprints are identical and 0 means they are completely different.

12345
# Calculate Tanimoto similarity between two molecules using RDKit from rdkit.DataStructs import FingerprintSimilarity similarity = FingerprintSimilarity(fp1, fp2) print("Tanimoto similarity between ethanol and propanol:", similarity)
copy

Similarity searching is a practical tool in chemoinformatics. By measuring how similar a new compound is to known drugs or active molecules, you can:

  • Prioritize which compounds to test;
  • Group compounds into clusters;
  • Predict biological activity.

High similarity to a known active compound may suggest similar function, while lower similarity can indicate novel chemical space worth exploring.

1. What does a Tanimoto similarity of 1.0 mean?

2. Which type of data is required to compute Tanimoto similarity?

3. Why is molecular similarity important in drug discovery?

question mark

What does a Tanimoto similarity of 1.0 mean?

Select the correct answer

question mark

Which type of data is required to compute Tanimoto similarity?

Select all correct answers

question mark

Why is molecular similarity important in drug discovery?

Select all correct answers

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 1

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you explain how the Tanimoto similarity is calculated from the fingerprints?

What are some other types of molecular fingerprints besides Morgan fingerprints?

How can I use similarity searching to prioritize compounds in a drug discovery project?

bookTanimoto Similarity and Molecular Comparison

Svep för att visa menyn

Understanding how similar two molecules are can be a powerful guide in drug discovery. Molecular similarity helps you find compounds with related activity, predict properties, and suggest new drug candidates by comparing them to known molecules. This approach is central to tasks like lead optimization, virtual screening, and clustering compound libraries to identify promising chemical series.

Note
Definition

The Tanimoto coefficient is a measure of similarity between two sets, often used to compare molecular fingerprints. It is calculated as the size of the intersection divided by the size of the union of the sets. In the context of fingerprints, it reflects the proportion of shared features between two molecules. A value of 1.0 means the molecules are identical by the fingerprint used, while 0.0 means they share no common features.

12345678910111213141516171819
from rdkit import Chem from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Define molecules smiles1 = "CCO" # Ethanol smiles2 = "CCCO" # Propanol mol1 = Chem.MolFromSmiles(smiles1) mol2 = Chem.MolFromSmiles(smiles2) # Create Morgan fingerprint generator morgan_gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fp1 = morgan_gen.GetFingerprint(mol1) fp2 = morgan_gen.GetFingerprint(mol2) print("Fingerprint 1 (Ethanol):", list(fp1.GetOnBits())[:10]) print("Fingerprint 2 (Propanol):", list(fp2.GetOnBits())[:10])
copy

To compare molecules computationally, fingerprints are used as compact representations of their structure. After generating fingerprints, you need a way to compare them quantitatively. The Tanimoto similarity coefficient is the most widely used metric for this purpose. It measures the overlap between two fingerprint bit vectors, giving a value between 0 and 1, where 1 means the fingerprints are identical and 0 means they are completely different.

12345
# Calculate Tanimoto similarity between two molecules using RDKit from rdkit.DataStructs import FingerprintSimilarity similarity = FingerprintSimilarity(fp1, fp2) print("Tanimoto similarity between ethanol and propanol:", similarity)
copy

Similarity searching is a practical tool in chemoinformatics. By measuring how similar a new compound is to known drugs or active molecules, you can:

  • Prioritize which compounds to test;
  • Group compounds into clusters;
  • Predict biological activity.

High similarity to a known active compound may suggest similar function, while lower similarity can indicate novel chemical space worth exploring.

1. What does a Tanimoto similarity of 1.0 mean?

2. Which type of data is required to compute Tanimoto similarity?

3. Why is molecular similarity important in drug discovery?

question mark

What does a Tanimoto similarity of 1.0 mean?

Select the correct answer

question mark

Which type of data is required to compute Tanimoto similarity?

Select all correct answers

question mark

Why is molecular similarity important in drug discovery?

Select all correct answers

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 1
some-alt