smiles simplified molecular input line entry system (smiles) widely used and computationally...

21
SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally efficient • Uses atomic symbols and a set of intuitive rules • Uses hydrogen-suppressed molecular graphs (HSMG)

Upload: aleah-reder

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

SMILES

• Simplified Molecular Input Line Entry System (SMILES)

• Widely used AND computationally efficient

• Uses atomic symbols and a set of intuitive rules

• Uses hydrogen-suppressed molecular graphs (HSMG)

SMILES Bonds

SINGLE*

DOUBLE

TRIPLE

AROMATIC*

* can be omitted

-

=

#

:

Butanols

2-Butanol

iso-Butanol

tert-Butanol

O

O

O

SMILES Branches

• Represented by enclosure in parentheses

• Can be nested or stacked

• Examples:CC(O)CC is 2-Butanol

OCC(C)C is iso-Butanol

OC(C)(C)C is tert-Butanol

SMILES Bonds

Ethene

Chloroethene

1,1-Dichloroethene

cis-1,2-Dichloroethene

Trichloroethene

Perchloroethene

C=C

ClC=C

ClC(Cl)=C

ClC=CCl

ClC(Cl)=CCl

ClC(Cl)=C(Cl)Cl

SMILES Atoms

• Use normal chemical symbols

• Add punctuation symbols if necessary

• No super- or subscripts

SMILES Symbols

• String of alphanumeric characters and certain punctuation symbols

• Terminates at the first space encountered when read left to right

• The ORGANIC SUBSET:

B, C, N, O, P, S, F, Cl, Br, I

Other SMILES Atoms

• Aliphatic or nonaromatic carbon: C

• Atom in aromatic ring: lowercase letter

• Designate ring closure with pairs of matching digits, e.g.

c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereas

C1CCCCC1 is Cyclohexane

SMILES Charges

• Specify attached hydrogens and charges in square brackets

• Number of attached hydrogens is the symbol H followed by optional digit

SMILES Charges

[H+]

[OH-]

[OH3+]

[Fe++]

[NH4+]

proton

hydroxyl anion

hydronium cation

iron(II) cation

ammonium cation

SMILES Cyclic Structures

• Break one single or one aromatic bond in each ring

• Number in any order– Designate ring-breaking atoms by the

same digit following the atomic symbol

Cyclic Structures

• Numbers indicate start and stop of ring• Same number indicates start and end of the

ring, entered immediately following the start/end atoms

• Only numbers 1 – 9 are used• A number should appear only twice• Atom can be associated w. 2 consecutive

numbers, e.g., Napthalene: c12ccccc1cccc2

Naphthalene

c12ccccc1cccc2

SMILES Conventions

• Avoid two consecutive left parentheses if possible

• Strive for the fewest number of possible branches

• Tautomeric bonds are not designated; enter the appropriate form

Further Restrictions

• A branch cannot begin a SMILES notation

• A branch cannot immediately follow a double- or triple-bond symbol

• Example: C=(CC)C is invalid, but

• C(=CC)C or C(CC)=C are valid SMILES

SMILES Fragments

• Nitro• Nitrate• Nitrite• Sulfonic acid• Cyanide/Nitrile• Azide• Azido

• N(=O)(=O)• ON(=O)(=O)• ON(=O)• S(=O)(=O)O• C#N• N=N#N• N+=N-

SMILES Metals[Al] [As] [Au] [Be]

[Bi] [Cd] [Ca] [Fe]

[Hg] [K] [Li] [Mg]

[Na] [Ni] [Pt] [Sb]

[Sn] [Zn] [Zr]

Disconnected Structures

• Indicated by a dot

• Tetramethyl ammonium bromide

C[N+]C(C)C.[Br-]

Isomeric and Chiral SMILES

• Isomeric configuration indicated by forward and backward slashes: / \

• Examples:– trans-1,2-dibromoethene: Br/C=C/Br

• Direction of the slash continues

– cis-1,2-dibromoethene: Br/C=C\Br• Direction of the slash reverses

• Chirality indicated by the “@” symbol

Some Applications

• JMDraw/SMILESViewer (Christoph Steinbeck)

• JME Molecular Editor (Peter Ertl)• STN Express (SMILES as output)• Tripos (dbtranslate: SMILES to MOL)• Marvin (Ferenc Csizmadia)

http://chemaxon.com/marvin/

• CACTVS http://www2.ccc.uni-erlangen.de/cactvs/

Another Application

• SMILESCAS Databasehttp://www.syrres.com/esc/smilecas.htm

Over 103,000 SMILES notations

• Input CAS Registry Number

• Leads to SMILES and thence to a structure search