Chemistry Validate

POST/science/v1/chemistry/validate

What it does

Send any chemical notation — SMILES, InChI, or MDL Molfile (V2000/V3000) — and get back a complete validation report in milliseconds. Powered by RDKit, the industry-standard cheminformatics toolkit.

What you get back

OutputTierWhat it gives you
Canonical SMILESFreeNormalized, deterministic SMILES string — safe for dedup, indexing, and downstream pipelines.
InChIFreeIUPAC International Chemical Identifier for cross-database lookup.
Molecular descriptorsFreeFormula, exact weight, atom/bond counts, TPSA, cLogP, rotatable bonds, aromaticity, stereochemistry.
Lipinski Ro5FreeDrug-likeness check — pass/fail on all four Lipinski rules with actual values.
InChI KeyPremium27-character hash for exact structure matching and registry dedup.
MolblockPremiumFull MDL Molblock with 2D coordinates — ready for structure export and visualization.

Invalid molecules are not errors

If the input is chemically invalid, the API still returns a successful response with data.valid: false. You get a structured reason explaining why it failed, plus fix recommendations. This makes it safe to use in form validation, user input pipelines, and batch processing without error handling for expected invalid inputs.

Why use it?

  • Clean your data before it breaks downstream. Garbage SMILES in a database causes silent failures in docking, retrosynthesis, and ML pipelines. Validate early, fix cheap.
  • Canonical = deterministic. The same molecule written 10 different ways produces the same canonical SMILES every time. Use it as a primary key, dedup identifier, or cache key.
  • Real-time form validation. Build input fields that validate chemistry notation as users type — with human-readable error messages, not stack traces.
  • Drug-likeness screening. Lipinski Rule of Five check built in. Filter compound libraries before expensive experiments.
  • Actionable fix suggestions. Invalid SMILES? The API analyzes common mistakes (unbalanced parentheses, valence errors, unclosed rings) and tells you exactly what to fix.
  • No RDKit installation required. Skip the painful C++ build, Python bindings, and WASM setup. One HTTP call, any language.

Examples

Validate Aspirin (SMILES)

curl -X POST https://api.creatornode.io/science/v1/chemistry/validate \ -H "Content-Type: application/json" \ -H "X-API-Key: YOUR_KEY" \ -d '{ "input": "CC(=O)OC1=CC=CC=C1C(=O)O" }'

Response — valid molecule

{ "success": true, "data": { "valid": true, "format": "smiles", "input": "CC(=O)OC1=CC=CC=C1C(=O)O", "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "inchi": "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)", "metadata": { "molecularFormula": "C9H8O4", "exactMolecularWeight": 180.042, "numAtoms": 21, "numBonds": 21, "numRings": 1, "numHeavyAtoms": 13, "numHBA": 4, "numHBD": 1, "TPSA": 63.6, "cLogP": 1.31, "numRotatableBonds": 3, "numAromaticRings": 1, "hasStereo": false }, "drugLikeness": { "lipinskiRo5": true, "violations": 0, "rules": [ { "rule": "MW ≤ 500", "passed": true, "value": 180.04 }, { "rule": "cLogP ≤ 5", "passed": true, "value": 1.31 }, { "rule": "HBD ≤ 5", "passed": true, "value": 1 }, { "rule": "HBA ≤ 10", "passed": true, "value": 4 } ] } }, "meta": { "requestId": "abc-123", "processingTimeMs": 12, "rdkitVersion": "2025.3.4" } }

Response — invalid molecule

{ "success": true, "data": { "valid": false, "format": "smiles", "input": "CC(=O", "error": { "message": "SMILES Parse Error: unclosed ring or branch" } }, "recommendations": [ { "type": "fix", "title": "Input fix", "message": "Unbalanced parentheses: 1 opening '(' without matching ')'.", "priority": "high" } ] }

InChI input with premium outputs

{ "input": "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)", "format": "inchi", "options": { "inchiKey": true, "molblock": true } }

On free tier, inchiKey and molblock are silently ignored — upgrade to Premium to include them in the response.

Supported input formats

FormatExampleWhen to use
SMILESCC(=O)Oc1ccccc1C(=O)OMost common. Compact, human-readable line notation.
InChIInChI=1S/C9H8O4/...Cross-database lookup. Standard identifier from IUPAC.
Molfile(multi-line V2000/V3000)Structure files from ChemDraw, Marvin, or database exports.
💡 Auto-detection: You don't need to specify format. The API detects it from the input: InChI strings start with InChI=, Molfiles contain V2000/V3000, everything else is treated as SMILES.

Options reference

OptionTypeDefaultWhat it does
canonicalSmilesbooleantrueInclude canonical (normalized) SMILES in output.
inchibooleantrueInclude InChI string in output.
inchiKeybooleanfalseInclude InChI Key (27-char hash). Premium only.
molblockbooleanfalseInclude MDL Molblock with 2D coords. Premium only.
descriptorsbooleantrueInclude molecular descriptors (weight, TPSA, cLogP, etc.).
sanitizebooleantrueRun RDKit sanitization (valence check). Set to false for molecules with non-standard valence.

Tips & tricks

💡 Tip: For the full request/response schema, OpenAPI spec, and the interactive demo endpoint, see the API Reference.
  • Let format auto-detection work for you. The format field defaults to "auto" and correctly identifies SMILES, InChI, and Molfile. Only set it explicitly if you know the format and want to skip detection.
  • Use canonical SMILES as your primary key. Multiple SMILES strings can represent the same molecule. Canonical SMILES eliminates duplicates — store it, index it, compare it.
  • Valence errors? Try sanitize: false. Some molecules (vancomycin fragments, metal complexes) have atoms that violate standard valence rules. Setting "options": { "sanitize": false } lets RDKit parse them anyway and produce a canonical SMILES. The API will suggest this automatically when it detects a valence error.
  • Batch with confidence. The API returns valid: false as a normal response, not an error. You can process hundreds of molecules in a loop without try/catch — just check data.valid.
  • Read the fix recommendations. When validation fails, the recommendations array contains specific, actionable fixes — unbalanced parentheses, unclosed rings, valence issues — not generic error messages.
  • Lipinski filtering at scale. Use drugLikeness.lipinskiRo5 to filter compound libraries. Molecules with more than 1 violation are unlikely to be orally bioavailable.
  • Input limits are in bytes, not characters. UTF-8 encoding means special characters in Molfiles use more than 1 byte. Free tier: 200 bytes, Premium: 4,000 bytes.

Cost & Limits

FeatureDetail
Credit cost1 credit per request
Input formatsSMILES, InChI, MDL Molfile (V2000/V3000)
Free outputscanonicalSmiles, inchi, metadata, drugLikeness
Premium outputsinchiKey, molblock
EngineRDKit

Tier Limits

LimitFreePremium
Max input size200 bytes4 000 bytes
InChI KeyNoYes
Molblock exportNoYes

Other Endpoints