pisc-392; no.of pages7 +model article in press · pisc-392; no.of pages7 glyde-ii: the glycan data...

7
Please cite this article in press as: Ranzinger, R., et al., GLYDE-II: The GLYcan data exchange format. Perspect. Sci. (2016), http://dx.doi.org/10.1016/j.pisc.2016.05.013 ARTICLE IN PRESS +Model PISC-392; No. of Pages 7 Perspectives in Science (2016) xxx, xxx—xxx Available online at www.sciencedirect.com ScienceDirect journal homepage: www.elsevier.com/pisc GLYDE-II: The GLYcan data exchange format Rene Ranzinger a , Krys J. Kochut b , John A. Miller b , Matthew Eavenson b , Thomas Lütteke c , William S. York a,b,a Complex Carbohydrate Research Center, University of Georgia, USA b Computer Science Department, University of Georgia, USA c Institute of Veterinary Physiology and Biochemistry, Justus-Liebig-University Giessen, Germany Received 31 March 2016; accepted 13 May 2016 KEYWORDS Bioinformatics; Glycan; Structure; Representation; XML Summary The GLYcan Data Exchange (GLYDE) standard has been developed for the repre- sentation of the chemical structures of monosaccharides, glycans and glycoconjugates using a connection table formalism formatted in XML. This format allows structures, including those that do not exist in any database, to be unambiguously represented and shared by diverse com- putational tools. GLYDE implements a partonomy model based on human language along with rules that provide consistent structural representations, including a robust namespace for spec- ifying monosaccharides. This approach facilitates the reuse of data processing software at the level of granularity that is most appropriate for extraction of the desired information. GLYDE-II has already been used as a key element of several glycoinformatics tools. The philosophical and technical underpinnings of GLYDE-II and recent implementation of its enhanced features are described. © 2016 Published by Elsevier GmbH. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. This article is part of a spe- cial issue entitled Proceedings of the Beilstein Glyco-Bioinformatics Symposium 2015 with copyright © 2017 Beilstein-lnstitut. Published by Elsevier GmbH. All rights reserved. Corresponding author. E-mail address: [email protected] (W.S. York). Introduction Glycobiology, broadly dened as the study of the struc- ture, biosynthesis and biological functions of glycans and glycoconjugates, is an emerging eld of research that has found increasing applications in diverse technologies rang- ing from medicine to biofuels (Varki and Sharon, 2009; Walt et al., 2012). Glycomics, which focuses on the structures and abundances of specic glycans in various biological samples, has been enabled by developments in molecular analysis that make it possible to detect, identify and quantify gly- http://dx.doi.org/10.1016/j.pisc.2016.05.013 2213-0209/© 2016 Published by Elsevier GmbH. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Upload: others

Post on 10-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

ARTICLE IN PRESS+ModelPISC-392; No. of Pages 7

Perspectives in Science (2016) xxx, xxx—xxx

Available online at www.sciencedirect.com

ScienceDirect

journa l homepage: www.e lsev ier .com/pisc

GLYDE-II: The GLYcan data exchangeformat�

Rene Ranzingera, Krys J. Kochutb, John A. Millerb,Matthew Eavensonb, Thomas Lüttekec, William S. Yorka,b,∗

a Complex Carbohydrate Research Center, University of Georgia, USAb Computer Science Department, University of Georgia, USAc Institute of Veterinary Physiology and Biochemistry, Justus-Liebig-University Giessen, Germany

Received 31 March 2016; accepted 13 May 2016

KEYWORDSBioinformatics;Glycan;Structure;Representation;XML

Summary The GLYcan Data Exchange (GLYDE) standard has been developed for the repre-sentation of the chemical structures of monosaccharides, glycans and glycoconjugates using aconnection table formalism formatted in XML. This format allows structures, including thosethat do not exist in any database, to be unambiguously represented and shared by diverse com-putational tools. GLYDE implements a partonomy model based on human language along withrules that provide consistent structural representations, including a robust namespace for spec-ifying monosaccharides. This approach facilitates the reuse of data processing software at thelevel of granularity that is most appropriate for extraction of the desired information. GLYDE-II

has already been used as a key element of several glycoinformatics tools. The philosophicaland technical underpinnings of GLYDE-II and recent implementation of its enhanced featuresare described.© 2016 Published by Elsevier GmbH. This is an open access article under the CC BY license(http://creativecommons.org/licenses/by/4.0/).

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

� This is an open-access article distributed under the terms of theCreative Commons Attribution License, which permits unrestricteduse, distribution, and reproduction in any medium, provided theoriginal author and source are credited. This article is part of a spe-cial issue entitled Proceedings of the Beilstein Glyco-BioinformaticsSymposium 2015 with copyright© 2017 Beilstein-lnstitut. Publishedby Elsevier GmbH. All rights reserved.

∗ Corresponding author.E-mail address: [email protected] (W.S. York).

I

Gtgfieaht

http://dx.doi.org/10.1016/j.pisc.2016.05.0132213-0209/© 2016 Published by Elsevier GmbH. This is(http://creativecommons.org/licenses/by/4.0/).

ntroduction

lycobiology, broadly defined as the study of the struc-ure, biosynthesis and biological functions of glycans andlycoconjugates, is an emerging field of research that hasound increasing applications in diverse technologies rang-ng from medicine to biofuels (Varki and Sharon, 2009; Walt

I: The GLYcan data exchange format. Perspect. Sci. (2016),

t al., 2012). Glycomics, which focuses on the structures andbundances of specific glycans in various biological samples,as been enabled by developments in molecular analysishat make it possible to detect, identify and quantify gly-

an open access article under the CC BY license

Page 2: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

IN+ModelP

2

c(oitnopg(

ragTkscatolpa

sddaidatdtar(Tw((Ccsocbbmpti(MGcmetm

R

Tficiwt(btsuage

ctTrsG

H

TwaemHresent all of the structural variation observed in glycans andglycoconjugates.

GLYDE-II was developed to account for this structuralvariation by using XML to implement a connection table

Figure 1 Some of the partonomy relationships implementedby GLYDE-II. Each object (rounded rectangles) is classified (as anatom, residue, etc.) according to its complexity. Arrows ema-nating from an object point to its parts. Each object can havemany parts, only a few of which are shown. The structure of

ARTICLEISC-392; No. of Pages 7

ans as free molecules or as components of glycoconjugatesCummings and Pierce, 2014). However, the developmentf glycomics has lagged behind genomics and proteomicsn large part due to analytical challenges that stem fromhe structural complexity of glycans. Glycan structures can-ot be inferred directly from genomic data, as the structuref each glycan is the result of an often complex metabolicathway whose individual steps are catalyzed by sundrylycosyl-transferases and other glycan modifying enzymesVarki and Sharon, 2009).Progress in glycomics has also been slowed by lack of

obust informatics tools to archive, retrieve, analyze, minend transfer the large amounts of multifaceted data that isenerated in the course of this research (York et al., 2014).he development of databases and ontologies that containnowledge regarding glycan structures and their relation-hips to biological and physical phenomena is especiallyhallenging. Nevertheless, several isolated databases thatrchive information about carbohydrate structure, biosyn-hesis and function have been developed. The integrationf this diverse information is a major bioinformatics chal-enge that must be addressed if glycobiology is to reach itsotential to address critical issues in biomedicine, biofuelsnd other domains.The existing carbohydrate databases implement diverse

tructural representation protocols. One reason for theevelopment of different glycan sequence formats is theiversity of molecular building blocks (monosaccharides)nd the frequent existence of branches. A second reasons that few of the sequence formats used in the variousatabases have been published. Unfortunately, this lack ofccessibility has led to the proliferation of formats ratherhan to the establishment of a single standard. More than aozen sequence formats have been developed, and morehan one can be used in a single database or softwarepplication. These sequence formats include linearized rep-esentations of the branched sequences, such as LINUCSBohne-Lang et al., 2001), the BCSDB format (Egorova andoukach, 2014) and LinearCode® (Banin et al., 2002), asell as connection table representations, such as GlycoCTHerget et al., 2008), WURCS (Tanaka et al., 2014) and KCFHashimoto et al., 2006) and XML representations, such asabosML (Kikuchi et al., 2005). The development of Gly-omeDB (Ranzinger et al., 2011), a meta-database for glycantructures, has been a major step towards the integrationf this structural data. More recently, the GlyTouCan gly-an structure repository (Aoki-Kinoshita et al., 2016) haseen implemented to provide a robust and stable semanticasis for describing glycan structure. Nevertheless, com-unication between different data acquisition, storage androcessing systems still requires well-defined methods forransferring structural information that may not be includedn an existing database. The GLYcan Data Exchange formatGLYDE) has been developed as a standard XML (Extensiblearkup Language) based format to address these concerns.LYDE has been widely accepted by the glycoinformaticsommunity (Packer et al., 2008) and is a frequently used for-at for exchanging glycan structure information (Eavenson

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

t al., 2015). This paper describes the philosophical andechnical background for GLYDE and the recently imple-ented enhancements of this standard.

eeoa

PRESSR. Ranzinger et al.

esults

he GLYDE XML format is defined via two schema speci-cations: an XSD (XML Schema Definition) schema and aomplementary but more flexible DTD (Document Type Def-nition) schema. These define a specification framework,hich we call PARCHMENT (PARtonomy of CHeMical ENTi-ies), which allows the structure of biological moleculesincluding complex glycans) to be completely and unam-iguously specified at several levels of granularity. Notably,his provides a unified format for the concise and completetructural representation of each molecule in the vast, nat-rally occurring combinatorial set of glycoconjugates thatrises by the attachment of a large number of distinctlycans to a large number of distinct non-carbohydrate moi-ties.The GLYDE standard also includes a set of rules, naming

onventions for the parts, and enumeration of chemical enti-ies that are acceptable parts at various levels of granularity.hese implementation rules are absolutely required forepresentational consistency and disambiguation, as purelyyntactic enforcement of these rules (e.g. solely by theLYDE schema) is not possible.

istory

he GLYDE format for the representation of glycan structureas developed to take advantage of the hierarchical syntaxnd extensibility of XML. The first version of GLYDE (Sahoot al., 2005) used a hierarchical XML tree to intuitivelyirror tree-like structures of branched glycan structures.owever, this approach was found to be insufficient to rep-

I: The GLYcan data exchange format. Perspect. Sci. (2016),

ach part is defined by reference to an archetypal object. Forxample, the structure of each of the distinct carbon atombjects in a sugar residue is specified by reference to therchetypal carbon atom.

Page 3: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

IN+Model

ToVo

caeasmFdcbubttmbearrlmcs

I

Atimao

ARTICLEPISC-392; No. of Pages 7

GLYDE-II: The GLYcan data exchange format

approach. In this context, GLYDE-II provides a consistentnamespace for monosaccharides and the ability to representrepeating structural features. Notably, the GLYDE-II syntaxenforces a partonomic model in which structures are spec-ified as collections of parts (Fig. 1), which can themselvesbe collections of smaller parts, facilitating the representa-tion of highly complex molecules such as glycoconjugates.GLYDE-II specifies molecular topology by defining explicitconnections (i.e. links) between the parts of the molecule.Version 1.2 of GLYDE introduces two important modifica-tions: (1) classification of molecular parts using semanticsthat are more consistent with the way biochemists viewbiopolymer structure and processing and (2) optional spec-ification of molecular geometry using a general approachthat is consistent over all granularity levels and for all dif-ferent classes of molecular parts. These modifications makeit possible for the glycoscientist to take advantage of theGLYDE partonomy model to create and interpret represen-tations of glycoconjugate structure that are consistent withthe way biochemists think about these complex moleculesand their interactions with other molecules.

Partonomy and archetyping

One aspect that distinguishes the different glycan struc-ture representation formats that have been developed isthe way different parts of these molecules are grouped andspecified. This stems from the fact that glycans are oftenfound as components of more complex structures commonlyreferred to as glycoconjugates, which include glycopro-teins, glycolipids and chemically modified glycans, suchas oligosaccharides that have a fluorescent tag attachedto the reducing end. Furthermore, the complex, branchednature of glycans has led to semantic differences in the wayglycans and non-glycan moieties are described. However,GLYDE is based on semantics that are common to diversetypes of biopolymers, providing an integrated way to repre-sent the different parts of a glycoconjugate structure. This

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

approach allows the representation and processing of struc-tures that are specified at different levels of granularity(e.g. to identify specific components or calculate molecularmasses) to be performed using a common set of algorithms.

csoa

Figure 2 GLYDE representation of a collection of three atom objecabout the structure of each atom are indicated by uri tags.

PRESS3

o make this practical, structural granularity must be rig-rously defined using the concept of partonomy (Casati andarzi, 1999), in which each object is described as a collectionf parts (Fig. 1).Along with partonomy, object archetyping is another

ore feature of GLYDE. For example, the glycan moiety ofglycoconjugate (e.g. glycopeptide) is defined by refer-nce to a glycan molecule (e.g. the glycan released fromglycopeptide by the enzyme PNGase-F). Such use of a

mall archetype molecule to describe a part of a largerolecule is consistent with conventional chemical language.or example, a glucose residue in a glycan is customarilyenoted by reference to the molecule glucose (a monosac-haride). These semantics are based on concepts that areoth historical (e.g. the characterization of small biomolec-lar building blocks such as purines, sugars and amino acidsy Fischer Kunz, 2002) and biochemical (e.g. the transforma-ion of such small molecules into so-called ‘‘residues’’ whenhey are incorporated into biopolymers). This approachakes the structural representations much more concisey allowing detailed structural information (and/or refer-nces to external sources) to be specified one time for therchetype and inferred thereafter when the archetype iseferenced. Archetyping atoms makes the structural rep-esentation somewhat more concise. However, archetypingarger structures (like sugar residues or glycan moieties)akes the structural representation substantially more con-ise. This is an important consideration when using a formatuch as XML that is verbose by nature.

mplementation of the GLYDE-II partonomy model

key function of GLYDE is to provide a basis for the opera-ional classification of molecular objects, using a vocabularyndicated by italics in the text below. At the most funda-ental level, GLYDE specifies structures as collections oftoms. Fig. 2 illustrates a minimal GLYDE-II representationf three atom objects. By definition, an atom object is not

I: The GLYcan data exchange format. Perspect. Sci. (2016),

onnected to another atom object by a molecular bond. Inome cases, an unattached atom can be specified as a partf a GLYDE structure, but more frequently an atom is useds an archetype for a bound atom, which is, by definition,

ts (hydrogen, carbon, oxygen). External sources of information

Page 4: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

ARTICLE IN PRESS+ModelPISC-392; No. of Pages 7

4 R. Ranzinger et al.

Figure 3 Stick model (A) and abbreviated GLYDE-II representation (B) of an �-D-Manp molecule. (Several lines of XML code areomitted for brevity; the remaining lines are numbered.) The structure of each bound atom is specified by its ref attribute (lines3, 6 and 33), which points to a GLYDE-II representation of the atom (Fig. 2) serving as the archetype for the bound atom. Themolecular topology is fully specified by atom link objects (e.g. lines 60 and 69), which connect bound atom objects. The molecularconfiguration (stereochemistry) is specified explicitly by listing the coordinates of each bound atom. Alternatively, stereochemistryof themolecule can be inferred from its id (line 2), which by rule corresponds to its representation using a format based on GlycoCT.The conventional orientation and position of the Cartesian axes in the atomistic GLYDE-II representation is defined by the alignmentof three key bound atom objects: the anomeric carbon (C1 in this case) is at the origin, the ring oxygen (O5 in this case) is on thex is dq

cbmea

spaorrio

fce(eiaopt

-axis, and the highest-numbered carbon (C2 in this case) thatuadrant of the x,y-plain (where y > 0).

onnected to at least one other bound atom by a covalentond in the context of a molecule, which is a collection ofolecular parts that are connected by covalent bonds. How-ver, a molecule is defined such that it is not connected tony other molecule by covalent bonds.In general, the GLYDE specification of a molecule con-

ists of a list of its parts and the links between thosearts. A molecule is most directly represented in GLYDEs a collection of bound atom objects and the atom linkbjects that connect them, as illustrated for a monosaccha-

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

ide molecule (�-D-Manp) in Fig. 3. Within such a moleculeepresentation, the structure of each bound atom is spec-fied by its ref attribute, which indicates the id attributef the atom object (Fig. 2) that serves as the archetype

gaa

irectly linked to the anomeric carbon is in the first or second

or the bound atom. A molecule can also be specified as aollection of parts that are more complex than atoms. Forxample, a molecule can be composed of residue objectsFig. 4), whose structures are themselves specified by ref-rence to molecule objects (e.g. monosaccharides), whichn turn are specified as collections of bound atom andtom link objects (Fig. 3). Connections between residuebjects are specified by residue link objects, which encom-ass atom link objects that specify the atoms involved inhe covalent bonds connecting the residue objects (Fig. 4).

I: The GLYcan data exchange format. Perspect. Sci. (2016),

The structures of complex molecule objects such aslycoconjugates are specified using the same hierarchicalpproach (Fig. 5), in which larger structures are constructeds collections of smaller structures. Connections between

Page 5: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

ARTICLE IN PRESS+ModelPISC-392; No. of Pages 7

GLYDE-II: The GLYcan data exchange format 5

Figure 4 (A) GLYDE-II representation of the disaccharide molecule that is used as an archetype for the disaccharide moietyshown in Fig. 5. The properties of each of the residue objects in this molecule are specified by reference to the archetypal �-D-Manp molecule shown in Fig. 3. The residue link (line 7) formally indicates a directed bond from one residue to another. Thisresidue link, in turn, encompasses an atom link (line 8) from C1 of one of the �-D-Manp residue objects (part id = ‘‘2’’) to O2 ofthe other �-D-Manp residue object (part id = ‘‘1’’). These directional semantics allow unambiguous interpretation of the attributes(e.g., from = ‘‘C1’’ and to = ‘‘O2’’) in the atom link. That is, the bond extends from ‘‘C1’’ of �-D-Manp residue (part id = ‘‘1’’) to‘‘O2’’ of �-D-Manp residue (part id = ‘‘2’’). This code also infers that, in the context of this atom link, ‘‘to’’ is a synonym for ‘‘O2of �-D-Manp residue #1’’ and ‘‘from’’ is a synonym for ‘‘C1 of �-D-Manp residue #2’’. By rule, an atom in one residue can replacean atom in the other residue when a residue link is formed (see Panel B). The atom link attribute to replaces = ‘‘O1’’ can thus beunambiguously interpreted as ‘‘O2 of �-D-Manp residue #1 replaces O1 of �-D-Manp residue #2’’. (B) Formation of the glycosidic bondconnecting the two �-D-Manp residue objects in the disaccharide encoded by the text in (A). The GLYDE convention dictates that thegeometry of each archetypal monosaccharidemolecule is retained when it is transformed into a residue. During this transformation,

sidu

eia(gh(msrs

the glycosidic oxygen of one residue (e.g. O1 of the �-D-Manp rein the other residue (e.g. O2 of �-D-Manp residue #1).

the larger structures embody connections between thesmaller structures that they contain. For example, moi-ety link objects embody residue link objects.

Molecular geometry in GLYDE-II

Several different systematic nomenclatures (R/S, ˛/ˇ, D/L,etc.) have been developed to specify molecular stereo-chemistry. However, assigning the stereochemistry of anasymmetric atom (or generating an explicit geometric

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

interpretation of such an assignment) using a systematicnomenclature can be computationally challenging. GLYDEaddresses this issue by allowing the stereochemistry of amolecule to be specified using two different methods (1)

dffa

e #2) is released as a water molecule and replaced by an atom

xplicitly defining the three-dimensional geometry of eachts parts using Cartesian coordinates; (2) parsing the refttribute of each residue component of a molecule. Method1) provides a unified way to specify the configurationaleometry of all the atoms in a complex biopolymer withoutaving to parse a systematic stereochemical nomenclatureor combination of different nomenclatures). Method (2) isuch more efficient in contexts where an atomistic repre-entation is not required. In the case of a monosaccharideesidue, the value of the ref attribute is a string corre-ponding to its GlycoCT-based representation, which was

I: The GLYcan data exchange format. Perspect. Sci. (2016),

eveloped as part of the EUROCarbDB initiative. This allows,or example, �-linked glucose residues to be distinguishedrom �-linked glucose residues in a trivial manner withoutny need to generate and compare atomistic representa-

Page 6: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

ARTICLE IN+ModelPISC-392; No. of Pages 7

6

Figure 5 The structure of a glycopeptidemolecule illustratedas a hierarchical collection of GLYDE parts (bound atom, residueand moiety) connected by links. Residues in the glycan moietyare represented using the CFG graphical format for glycan struc-ture. (A) The glycan moiety and peptide moiety are connectedby a moiety link, which embodies a residue link connectingone of the �-D-Manp residue objects (green circle) in the gly-can moiety to the Ser residue in the peptide moiety. (B) Thetwo residues that connect the moieties are shown in atomicdetail. The residue link connecting these residues embodies anatom link connecting C1 of the �-D-Manp residue to O3 of theSer residue. (C) GLYDE-II representation of the glycopeptide.The moiety link indicates that the ‘‘O-Man-dimer’’ (Fig. 4) iscovalently attached to the ‘‘O-Man-peptide’’, whose structureis specified in ‘‘http://someURI’’. The enclosed residue linkindicates that the �-D-Manp residue (partid = ‘‘1’’ in the GLYDE-II specification of the ‘‘O-Man-dimer’’) is linked to the L-Serresidue (partid = ‘‘5’’ in the GLYDE-II specification of the ‘‘O-Man-peptide’’). The further enclosed atom link indicates thatthe bound atom C1 (partid = ‘‘C1’’ in the referenced GLYDE-IIspecification of �-D-Manp residue objects in the ‘‘O-Man-dimer’’moiety) is covalently attached to the bound atom O3 (par-tr

tGtae

ftt

C

Jidtpucdmcwrtctfc

G

Ttgsbiabcpaprntt(arpcs(t

A

Tutused as the standard format for common web service inter-

id = ‘‘O3’’ in the referenced GLYDE-II specification of the L-Seresidue objects in the ‘‘O-Man-peptide’’).

ions, which are rarely required for logical inference. Thus,LYDE fully supports both atomistic and abstract representa-

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

ions of molecular geometry, and notably, provides a logicalnd consistent framework for both representations at differ-nt levels of granularity. The conventions defined in GLYDE

fcs

PRESSR. Ranzinger et al.

acilitate the implementation of algorithms to interconverthe GLYDE representation with fully atomistic representa-ions (see Supporting information).

lassification of parts

udicious selection of the appropriate granularity in spec-fying or parsing a GLYDE representation will facilitate theevelopment of algorithms to identify correlations betweenhe structure of a biomolecule and its biological function orhysical properties. Therefore, parts at each level of gran-larity are distinguished using chemical and biosyntheticriteria. For example, several different moiety types areefined, including glycan moiety, peptide moiety and lipidoiety. A glycan moiety is composed primarily of monosac-haride residues (typically connected by glycosidic bonds)hile a peptide moiety is composed primarily of amino acidesidues (typically connected by amide bonds). Althoughhese semantics are not required to completely define thehemical structure of a biopolymer, they are useful for iden-ifying chemical and computational contexts in order toacilitate the use of the structural information for data pro-essing and knowledge discovery.

LYDE-II rules

he GLYDE syntax, described above, is not in itself sufficiento ensure that structural representations of glycoconju-ates will be consistent. Additional rules are required totandardize parameters such as the part id attribute of aound atom and the id of an archetype molecule (Support-ng information). As mentioned above, the id attribute ofmonosaccharide molecule corresponds to the GlycoCT-ased representation of that molecule. This provides aonvenient way to compare monosaccharide residue com-ositions or stereochemistry without generating a fullytomistic representation of the glycoconjugate. This isossible since the GlycoCT namespace for monosaccha-ides is machine-readable, providing unique, unambiguousames that encode the salient chemical properties ofhese molecules. A large collection of GlycoCT represen-ations are maintained and curated by MonosaccharideDBhttp://www.monosaccharidedb.org/). By rule, the refttribute of a GLYDE monosaccharide residue is a uniformesource identifier (URI) consisting of the following threearts: (1) the uniform resource locator (URL) of Monosac-harideDB (‘‘http://www.monosaccharideDB.org/’’); (2) aeries of characters corresponding to an http GET request‘‘GLYDE-II-1.2.jsp?G=’’) (3) the GlycoCT string representinghe molecule (e.g. ‘‘b-dglc-HEX-1:5’’).

pplications

he GLYDE-II format is currently supported in the widelysed MS annotation software GlycoWorkbench as an optiono export and import glycan structures. It is also being

I: The GLYcan data exchange format. Perspect. Sci. (2016),

aces that are being developed to enable exchange ofarbohydrate structural information among databases andoftware applications. These include the GlycO ontology

Page 7: PISC-392; No.of Pages7 +Model ARTICLE IN PRESS · PISC-392; No.of Pages7 GLYDE-II: The GLYcan data exchange format 3 approach. In this context, GLYDE-II provides a consistent namespace

IN+Model

H

H

K

K

d

P

R

S

T

V

W

Y

ARTICLEPISC-392; No. of Pages 7

GLYDE-II: The GLYcan data exchange format

at the CCRC, the database of the Consortium for Func-tional Glycomics (CFG), the Kyoto Encyclopedia of Genesand Genomes (KEGG) via the RINGS portal at Soka Univer-sity (http://rings.t.soka.ac.jp/), UniCarb-DB, EUROCarbDB,the meta-database of carbohydrate structure GlycomeDBand the recently initiated GlyTouCan glycan structure reg-istry. Support and utilization of GLYDE-II is a core featureof our Qrator software, which provides an intuitive inter-face that facilitates the human curation of glycan structures(Eavenson et al., 2015). We continue to develop softwarethat fully supports the GLYDE-II format.

Acknowledgement

This work is supported by the NIH/NIGMS-funded NationalCenter for Glycomics and Glycoproteomics (8P41GM103490).

Appendix A. Supplementary data

Supplementary data associated with this arti-cle can be found, in the online version, athttp://dx.doi.org/10.1016/j.pisc.2016.05.013.

References

Aoki-Kinoshita, K., Agravat, S., Aoki, N.P., Arpinar, S., Cum-mings, R.D., Fujita, A., Fujita, N., Hart, G.M., Haslam,S.M., Kawasaki, T., Matsubara, M., Moreman, K., Okuda,W., Pierce, S., Ranzinger, M., Shikanai, R., Solovieva, T.,Suzuki, E., Tsuchiya, Y., Yamada, S., York, I., Zaia, W.S.,Narimatsu, J., 2016. GlyTouCan 1.0—–the international gly-can structure repository. Nucleic Acids Res. 44, D1237—D1242,http://dx.doi.org/10.1093/nar/gkv1041.

Banin, E., Neuberger, Y., Altshuler, Y., Halevi, A., Inbar, O.,Dotan, N., Dukler, A., 2002. A novel LinearCode nomenclaturefor complex carbohydrates. Trends Glycosci. Glycotechnol. 14,127—137, http://dx.doi.org/10.4052/tigg.14.127.

Bohne-Lang, A., Lang, E., Förster, T., von der Lieth, C.-W., 2001. LINUCS: linear notation for unique descriptionof carbohydrate sequences. Carbohydr. Res. 336, 1—11,http://dx.doi.org/10.1016/s0008-6215(01)00230-0.

Casati, R., Varzi, A.C., 1999. Parts and Places: The Structures ofSpatial Representations. MIT Press.

Cummings, R.D., Pierce, J.M., 2014. The challengeand promise of glycomics. Chem. Biol. 21, 1—15,http://dx.doi.org/10.1016/j.chembiol.2013.12.010.

Eavenson, M., Kochut, K.J., Miller, J.A., Ranzinger, R., Tiemeyer,M., Aoki, K., York, W.S., 2015. Qrator: a web-based cura-tion tool for glycan structures. Glycobiology 25, 66—73,

Please cite this article in press as: Ranzinger, R., et al., GLYDE-Ihttp://dx.doi.org/10.1016/j.pisc.2016.05.013

http://dx.doi.org/10.1093/glycob/cwu090.Egorova, K.S., Toukach, P.V., 2014. Expansion of coverage of car-

bohydrate structure database (CSDB). Carbohydr. Res. 389,112—114, http://dx.doi.org/10.1016/j.carres.2013.10.009.

PRESS7

ashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K.F., Ueda,N., Hamajima, M., Kawasaki, T., Kanehisa, M., 2006. KEGGas a glycome informatics resource. Glycobiology 16, 63R—70R,http://dx.doi.org/10.1093/glycob/cwj010.

erget, S., Ranzinger, R., Maass, K., von der Lieth,C.-W., 2008. GlycoCT—–a unifying sequence formatfor carbohydrates. Carbohydr. Res. 343, 2162—2171,http://dx.doi.org/10.1016/j.carres.2008.03.011.

ikuchi, N., Kameyama, A., Nakaya, S., Ito, H., Sato, T., Shikanai,T., Takahashi, Y., Narimatsu, H., 2005. The carbohydratesequence markup language (CabosML): an XML descriptionof carbohydrate structures. Bioinformatics 21, 1717—1718,http://dx.doi.org/10.1093/bioinformatics/bti152.

unz, H., 2002. Emil Fischer—–unequalled classicist, masterof organic chemistry research, and inspired trailblazer ofbiological chemistry. Angew. Chem. Int. Ed. 41, 4439—4451,http://dx.doi.org/10.1002/1521-3773(20021202)41:23<4439::ai-anie4439>3.0.co;2-6.

acker, N.H., von der Lieth, C.-W., Aoki-Kinoshita, K.F., Lebrilla,C.B., Paulson, J.C., Raman, R., Rudd, P., Sasisekharan, R.,Taniguchi, N., York, W.S., 2008. Frontiers in glycomics: bioinfor-matics and biomarkers in disease. An NIH white paper preparedfrom discussions by the focus groups at a workshop on the NIHcampus, Bethesda MD (September 11—13, 2006). Proteomics 8,8—20. doi:10.1002/pmic.200700917.

anzinger, R., Herget, S., von der Lieth, C.-W., Frank,M., 2011. GlycomeDB—–a unified database for carbo-hydrate structures. Nucleic Acids Res. 39, D373—D376,http://dx.doi.org/10.1093/nar/gkq1014 (Database issue).

ahoo, S.S., Thomas, C., Sheth, A., Henson, C., York, W.S.,2005. GLYDE—–an expressive XML standard for the represen-tation of glycan structure. Carbohydr. Res. 340, 2802—2807,http://dx.doi.org/10.1016/j.carres.2005.09.019.

anaka, K., Aoki-Kinoshita, K.F., Kotera, M., Sawaki, H., Tsuchiya,S., Fujita, N., Shikanai, T., Kato, M., Kawano, S., Yamada, I.,Narimatsu, H., 2014. WURCS: the Web3 unique representationof carbohydrate structures. J. Chem. Inf. Model. 54, 1558—1566,http://dx.doi.org/10.1021/ci400571e.

arki, A., Sharon, N., 2009. Historical background and overview. In:Varki, A., Cummings, R.D., Esko, J.D., Freeze, H.H., Stanley,P., Bertozzi, C.R., Hart, G.W., Etzler, M.E. (Eds.), Essentials ofGlycobiology. , second ed. Cold Spring Harbor Laboratory Press,Cold Spring Harbor (NY).

alt, D.A., Aoki-Kinoshita, A.F., Bendiak, B., Bertozzi, C.R., Boons,G.-J., Darvill, A., Hart, G., Kiessling, L.L., Lowe, J., Moon,R.J., Paulson, J.C., Sasisekharan, R., Varki, A.P., Wong, C.-H.,Bowman, K., Friedman, D., Siddiqui, S., Yancey, R., 2012. Trans-forming Glycoscience: A Roadmap for the Future. The NationalAcademies Press.

ork, W.S., Agravat, S., Aoki-Kinoshita, K.F., McBride, R., Camp-bell, M.P., Costello, C.E., Dell, A., Feizi, T., Haslam, S.M.,Karlsson, N., Khoo, K.-H., Kolarich, D., Liu, Y., Novotny, M.,Packer, N.H., Paulson, J.C., Rapp, E., Ranzinger, R., Rudd,

I: The GLYcan data exchange format. Perspect. Sci. (2016),

P.M., Smith, D.F., Struwe, W.B., Tiemeyer, M., Wells, L.,Zaia, J., Kettner, C., 2014. MIRAGE—–the minimum informationrequired for a glycomics experiment. Glycobiology 24, 402—406,http://dx.doi.org/10.1093/glycob/cwu018.