20130410 carbohydrates

14
structure representation and public chemistry databases Colin Batchelor, Ken Karapetyan, David Sharpe, Valery Tkachenko and Antony Williams [email protected] ACS New Orleans April 2013

Upload: royal-society-of-chemistry

Post on 11-May-2015

1.184 views

Category:

Technology


0 download

DESCRIPTION

The challenges in dealing with sugar stereochemistry

TRANSCRIPT

Page 1: 20130410 carbohydrates

Carbohydrate structure representation and public

chemistry databasesColin Batchelor, Ken Karapetyan, David Sharpe,

Valery Tkachenko and Antony [email protected]

ACS New Orleans April 2013

Page 2: 20130410 carbohydrates

Overview

Public chemistry databases and registrationWhy sugar rings are difficultConsequencesAlgorithmsFuture directions

Page 3: 20130410 carbohydrates

Some public chemistry databases

Page 4: 20130410 carbohydrates

How registration works

Structures are accepted in some machine-readable format and boiled down to some position-independent canonical form.

Drop exact coordinates and retain only relative coordinates, disregarding bond length.

Canonicalization based on depiction of bonds (wedges or hashes) rather than 3D positions around atoms.

Page 5: 20130410 carbohydrates

Why sugars are difficult

Page 6: 20130410 carbohydrates

Why sugar rings are difficult

Page 7: 20130410 carbohydrates

Consequences

Page 8: 20130410 carbohydrates

Algorithm for hexagons

• Identify the perspective conformation (boat, chair, regular hexagon, and so on)

• Determine perspective stereo• Assign wedge or hash to the bonds

accordingly• (tricky) Reconstruct the sugar ring so as to

minimize disruption of the rest of the molecule

Page 9: 20130410 carbohydrates

Hexagons in the plane

Page 10: 20130410 carbohydrates

Assigning chair stereochemistry

Take the x-axis as either the line through the top two ring atoms or bottom two ring atoms.Substituents with Δy positive are up, Δy negative are down.Then remap chair to a regular hexagon (tricky).

Page 11: 20130410 carbohydrates

Assigning HaworthstereochemistryThis works for both hexagons and pentagons.Remove any hashes or wedges within the ring.Take the x-axis as a line through one of the ring C–O bonds.Substituents with Δy positive are up, Δy negative are down.The Haworth LLLLLL/RRRRRR hexagon is unappealing, but can be tidied to a regular hexagon grid without too much disruption.The same goes for the Haworth pentagon.

Page 12: 20130410 carbohydrates

Future work: integrate with CVSPStructure validation•Warn on query atoms, pseudo atoms, polymers, etc.•Nonsensical stereoAllows users to put together their own standardization workflow using modules provided:•Apply default CVSP or user-defined SMIRKS rules•Layout•Neutralize•Get canonical tautomer using ChemAxon’s algorithms•Get biggest organic fragment

http://cv.beta.rsc-us.org/

Page 13: 20130410 carbohydrates

More future work

Improve chair tidyingDo not disrupt/flip/invert or move around the aglyconeFused ringsRun over all of ChemSpider

Page 14: 20130410 carbohydrates

Questions?

E-mail: [email protected]