smiles simplified molecular input line entry system (smiles) widely used and computationally...
TRANSCRIPT
SMILES
• Simplified Molecular Input Line Entry System (SMILES)
• Widely used AND computationally efficient
• Uses atomic symbols and a set of intuitive rules
• Uses hydrogen-suppressed molecular graphs (HSMG)
SMILES Branches
• Represented by enclosure in parentheses
• Can be nested or stacked
• Examples:CC(O)CC is 2-Butanol
OCC(C)C is iso-Butanol
OC(C)(C)C is tert-Butanol
SMILES Bonds
Ethene
Chloroethene
1,1-Dichloroethene
cis-1,2-Dichloroethene
Trichloroethene
Perchloroethene
C=C
ClC=C
ClC(Cl)=C
ClC=CCl
ClC(Cl)=CCl
ClC(Cl)=C(Cl)Cl
SMILES Atoms
• Use normal chemical symbols
• Add punctuation symbols if necessary
• No super- or subscripts
SMILES Symbols
• String of alphanumeric characters and certain punctuation symbols
• Terminates at the first space encountered when read left to right
• The ORGANIC SUBSET:
B, C, N, O, P, S, F, Cl, Br, I
Other SMILES Atoms
• Aliphatic or nonaromatic carbon: C
• Atom in aromatic ring: lowercase letter
• Designate ring closure with pairs of matching digits, e.g.
c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereas
C1CCCCC1 is Cyclohexane
SMILES Charges
• Specify attached hydrogens and charges in square brackets
• Number of attached hydrogens is the symbol H followed by optional digit
SMILES Charges
[H+]
[OH-]
[OH3+]
[Fe++]
[NH4+]
proton
hydroxyl anion
hydronium cation
iron(II) cation
ammonium cation
SMILES Cyclic Structures
• Break one single or one aromatic bond in each ring
• Number in any order– Designate ring-breaking atoms by the
same digit following the atomic symbol
Cyclic Structures
• Numbers indicate start and stop of ring• Same number indicates start and end of the
ring, entered immediately following the start/end atoms
• Only numbers 1 – 9 are used• A number should appear only twice• Atom can be associated w. 2 consecutive
numbers, e.g., Napthalene: c12ccccc1cccc2
SMILES Conventions
• Avoid two consecutive left parentheses if possible
• Strive for the fewest number of possible branches
• Tautomeric bonds are not designated; enter the appropriate form
Further Restrictions
• A branch cannot begin a SMILES notation
• A branch cannot immediately follow a double- or triple-bond symbol
• Example: C=(CC)C is invalid, but
• C(=CC)C or C(CC)=C are valid SMILES
SMILES Fragments
• Nitro• Nitrate• Nitrite• Sulfonic acid• Cyanide/Nitrile• Azide• Azido
• N(=O)(=O)• ON(=O)(=O)• ON(=O)• S(=O)(=O)O• C#N• N=N#N• N+=N-
SMILES Metals[Al] [As] [Au] [Be]
[Bi] [Cd] [Ca] [Fe]
[Hg] [K] [Li] [Mg]
[Na] [Ni] [Pt] [Sb]
[Sn] [Zn] [Zr]
Isomeric and Chiral SMILES
• Isomeric configuration indicated by forward and backward slashes: / \
• Examples:– trans-1,2-dibromoethene: Br/C=C/Br
• Direction of the slash continues
– cis-1,2-dibromoethene: Br/C=C\Br• Direction of the slash reverses
• Chirality indicated by the “@” symbol
Some Applications
• JMDraw/SMILESViewer (Christoph Steinbeck)
• JME Molecular Editor (Peter Ertl)• STN Express (SMILES as output)• Tripos (dbtranslate: SMILES to MOL)• Marvin (Ferenc Csizmadia)
http://chemaxon.com/marvin/
• CACTVS http://www2.ccc.uni-erlangen.de/cactvs/
Another Application
• SMILESCAS Databasehttp://www.syrres.com/esc/smilecas.htm
Over 103,000 SMILES notations
• Input CAS Registry Number
• Leads to SMILES and thence to a structure search