Chemistry Compare

POST/science/v1/chemistry/compare

What it does

Send two molecules in any supported notation — SMILES, InChI, or MDL Molfile — and find out how they relate. Powered by RDKit Morgan fingerprints and substructure matching.

Comparison modes

ModeTierWhat it tells you
identityFreeAre these the same molecule? Compares canonical SMILES — handles atom reordering, aromatic/Kekulé differences, and notation variants.
similarityFreeHow similar are they? Tanimoto coefficient (0–1) on Morgan circular fingerprints, with human-readable interpretation.
substructurePremiumIs molecule A inside molecule B (or vice versa)? Graph-based substructure matching for scaffold/fragment detection.
fullPremiumAll of the above in a single request.

Invalid molecules are not errors

If one or both molecules fail to parse, the API still returns HTTP 200 with data.parseErrors describing what went wrong — consistent with how Chemistry Validate handles invalid inputs. No comparison is performed, but the response is structured and safe for pipelines.

Why use it?

  • Dedup compound databases. Identity mode tells you if two differently-written SMILES are actually the same molecule. Use it as a batch dedup step before expensive downstream processing.
  • Rank drug candidates by similarity. Generate variants → compare each to a reference molecule → sort by Tanimoto score. The interpretation label ("very similar", "similar", "moderately similar", "dissimilar", etc.) is ready for reports without manual threshold logic.
  • Scaffold matching. Does this compound contain a benzene ring? A sulfonamide group? Substructure mode answers fragment-in-molecule questions that regular string matching can't.
  • Cross-format comparison. Compare an InChI string against a SMILES string without pre-conversion. The API normalizes both inputs before comparing.
  • No RDKit installation required. Molecular fingerprinting and substructure search without C++ builds, Python bindings, or WASM setup. One HTTP call.

Examples

Identity check — same molecule, different notation

curl -X POST https://api.creatornode.io/science/v1/chemistry/compare \ -H "Content-Type: application/json" \ -H "X-API-Key: YOUR_KEY" \ -d '{ "molecules": [ { "input": "CC(=O)OC1=CC=CC=C1C(=O)O" }, { "input": "CC(=O)Oc1ccccc1C(=O)O" } ], "mode": "identity" }'

Response — identical

{ "success": true, "data": { "molecules": [ { "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" }, { "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" } ], "identity": { "identical": true } }, "meta": { "requestId": "abc-123", "processingTimeMs": 8, "rdkitVersion": "2025.3.4" } }

Similarity — Aspirin vs Ibuprofen

{ "molecules": [ { "input": "CC(=O)Oc1ccccc1C(=O)O" }, { "input": "CC(C)Cc1ccc(cc1)C(C)C(=O)O" } ], "mode": "similarity" }

Response — similarity score

{ "success": true, "data": { "molecules": [ { "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" }, { "canonicalSmiles": "CC(C)Cc1ccc(C(C)C(=O)O)cc1", "format": "smiles" } ], "similarity": { "tanimoto": 0.29, "fingerprintRadius": 2, "fingerprintBits": 2048, "interpretation": "dissimilar" } }, "meta": { "requestId": "def-456", "processingTimeMs": 10, "rdkitVersion": "2025.3.4" } }

Full mode — everything at once (Premium)

{ "molecules": [ { "input": "c1ccccc1" }, { "input": "c1ccc(cc1)O" } ], "mode": "full" }

Returns identity, similarity, and substructure blocks in a single response. Requires a Premium API key.

Comparison modes in detail

ModeAlgorithmOutput
identityCanonical SMILES string equalityidentical: true/false
similarityMorgan fingerprint → Tanimoto coefficienttanimoto: 0.0–1.0 + interpretation label
substructureRDKit get_substruct_match() graph searchaInB, bInA, relationship
fullAll three combinedAll output blocks in one response

Similarity interpretation thresholds

Tanimoto rangeInterpretationMeaning
≥ 0.95identicalEssentially the same structure
0.85 – 0.94very similarClose analogs, minor substituent differences
0.70 – 0.84similarSame scaffold family, moderate variation
0.50 – 0.69moderately similarShared substructure elements
0.30 – 0.49dissimilarLimited structural overlap
< 0.30unrelatedDifferent molecular classes

Fingerprint options

OptionTypeDefaultWhat it does
fingerprintRadiusnumber (1–4)2Morgan fingerprint radius. Higher = more extended features captured.
fingerprintBits1024 | 20482048Bit vector length. 2048 is standard; 1024 is faster but slightly less precise.
💡 Auto-detection: You don't need to specify format on each molecule. The API detects it from the input: InChI strings start with InChI=, Molfiles contain V2000/V3000, everything else is treated as SMILES.

Tips & tricks

💡 Tip: For the full request/response schema, OpenAPI spec, and the interactive demo endpoint, see the API Reference.
  • Let format auto-detection work for you. Both molecules default to format: "auto". The API detects SMILES, InChI, and Molfile automatically — you can even compare an InChI against a SMILES.
  • Tanimoto interpretation guide. The interpretation field saves you from hardcoding thresholds:
    ≥ 0.95 — identical, ≥ 0.85 — very similar, ≥ 0.7 — similar, ≥ 0.5 — moderately similar, ≥ 0.3 — dissimilar, < 0.3 — unrelated.
  • Fingerprint radius matters. The default radius of 2 is standard for drug-likeness comparisons. Increase to 3 or 4 to capture more extended molecular features — useful for distinguishing closely related analogs.
  • Substructure for fragment screening. Use "mode": "substructure" to check if a pharmacophore fragment is present in a larger molecule. The relationship field tells you the direction: "a_in_b", "b_in_a", "mutual", or "none".
  • Parse errors don't crash your pipeline. If one molecule is invalid, you get a parseErrors array with the index plus a hint / hintCodeto fix the input — no HTTP error, no exception, just structured data.
  • Input limits are in bytes, not characters. UTF-8 encoding means Molfiles use more than 1 byte per character. Free tier: 200 bytes per molecule, Premium: 4,000 bytes.

Cost & Limits

FeatureDetail
Credit cost3 credits per request
Input formatsSMILES, InChI, MDL Molfile (V2000/V3000)
Free modesidentity, similarity
Premium modessubstructure, full
EngineRDKit

Tier Limits

LimitFreePremium
Max input size (per molecule)200 bytes4 000 bytes
Identity modeYesYes
Similarity modeYesYes
Substructure modeNoYes
Full modeNoYes

Other Endpoints