Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Tanimoto Similarity and Molecular Comparison | Similarity, Clustering and Drug Discovery
Python for Chemoinformatics

bookTanimoto Similarity and Molecular Comparison

メニューを表示するにはスワイプしてください

Understanding how similar two molecules are can be a powerful guide in drug discovery. Molecular similarity helps you find compounds with related activity, predict properties, and suggest new drug candidates by comparing them to known molecules. This approach is central to tasks like lead optimization, virtual screening, and clustering compound libraries to identify promising chemical series.

Note
Definition

The Tanimoto coefficient is a measure of similarity between two sets, often used to compare molecular fingerprints. It is calculated as the size of the intersection divided by the size of the union of the sets. In the context of fingerprints, it reflects the proportion of shared features between two molecules. A value of 1.0 means the molecules are identical by the fingerprint used, while 0.0 means they share no common features.

12345678910111213141516171819
from rdkit import Chem from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator # Define molecules smiles1 = "CCO" # Ethanol smiles2 = "CCCO" # Propanol mol1 = Chem.MolFromSmiles(smiles1) mol2 = Chem.MolFromSmiles(smiles2) # Create Morgan fingerprint generator morgan_gen = GetMorganGenerator(radius=2, fpSize=1024) # Generate fingerprints fp1 = morgan_gen.GetFingerprint(mol1) fp2 = morgan_gen.GetFingerprint(mol2) print("Fingerprint 1 (Ethanol):", list(fp1.GetOnBits())[:10]) print("Fingerprint 2 (Propanol):", list(fp2.GetOnBits())[:10])
copy

To compare molecules computationally, fingerprints are used as compact representations of their structure. After generating fingerprints, you need a way to compare them quantitatively. The Tanimoto similarity coefficient is the most widely used metric for this purpose. It measures the overlap between two fingerprint bit vectors, giving a value between 0 and 1, where 1 means the fingerprints are identical and 0 means they are completely different.

12345
# Calculate Tanimoto similarity between two molecules using RDKit from rdkit.DataStructs import FingerprintSimilarity similarity = FingerprintSimilarity(fp1, fp2) print("Tanimoto similarity between ethanol and propanol:", similarity)
copy

Similarity searching is a practical tool in chemoinformatics. By measuring how similar a new compound is to known drugs or active molecules, you can:

  • Prioritize which compounds to test;
  • Group compounds into clusters;
  • Predict biological activity.

High similarity to a known active compound may suggest similar function, while lower similarity can indicate novel chemical space worth exploring.

1. What does a Tanimoto similarity of 1.0 mean?

2. Which type of data is required to compute Tanimoto similarity?

3. Why is molecular similarity important in drug discovery?

question mark

What does a Tanimoto similarity of 1.0 mean?

正しい答えを選んでください

question mark

Which type of data is required to compute Tanimoto similarity?

すべての正しい答えを選択

question mark

Why is molecular similarity important in drug discovery?

すべての正しい答えを選択

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 2.  1

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 2.  1
some-alt