xml databases in bmi

57
CDSS-1 CSE 300 XML Databases in BMI XML Databases in BMI UCONN Spring 2008, CSE 300: BMI taught by: Prof. Steve Demurjian presented by: James Lindsay <ClinicalDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance " xmlns:mif="urn:hl7-org:v3/mif" xmlns="urn:hl7-org:v3"> <realmCode code="US"/> <typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040"/> <!-- Conformant to NHSN Generic Constraints --> <templateId root="2.16.840.1.113883.3.117.1.1.1" /> <!-- Conformant to the NHSN Constraints for BSI Numerator Report --> <templateId root="2.16.840.1.113883.3.117.1.1.3.1" /> ... </ClinicalDocument>

Upload: howie

Post on 03-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

XML Databases in BMI. UCONN Spring 2008, CSE 300: BMI. taught by: Prof. Steve Demurjian. presented by: James Lindsay. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML Databases in BMI

CDSS-1

CSE 300

XML Databases in BMIXML Databases in BMI

UCONN Spring 2008, CSE 300: BMI

taught by: Prof. Steve Demurjian

presented by: James Lindsay<ClinicalDocument

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:mif="urn:hl7-org:v3/mif" xmlns="urn:hl7-org:v3">

<realmCode code="US"/>

<typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040"/>

<!-- Conformant to NHSN Generic Constraints -->

<templateId root="2.16.840.1.113883.3.117.1.1.1" />

<!-- Conformant to the NHSN Constraints for BSI Numerator Report -->

<templateId root="2.16.840.1.113883.3.117.1.1.3.1" />

...

</ClinicalDocument>

Page 2: XML Databases in BMI

CDSS-2

CSE 300

OverviewOverview

What is XML:What is XML: Overview, tags, schema.

XML query languages:XML query languages: XPath XQuery.

XML data models:XML data models: Data/document -centric,

biomedical data. Storage Strategy + XML Storage Strategy + XML

DBMS:DBMS: Relational, CMS, native.

Native XML DBMSNative XML DBMS Pros / Cons.

Biomedical InformationBiomedical Information BMI DatabasesBMI Databases

Overview, XML. HL7 and CDAHL7 and CDA

Overview, examples. Examples of BMI XML.Examples of BMI XML. UCONN BMI XML.UCONN BMI XML. Survey of Technology.Survey of Technology.

Page 3: XML Databases in BMI

CDSS-3

CSE 300

XML overviewXML overview

eXtensible Markup LanguageeXtensible Markup LanguageSimilar to HTMLSimilar to HTMLMeta-language that describes the content of the Meta-language that describes the content of the

document (self-describing).document (self-describing).XML is primarily used as a data storage and XML is primarily used as a data storage and

interchange medium.interchange medium.XML exists in plain text format, however it may XML exists in plain text format, however it may

be compressed, or altered for transfer.be compressed, or altered for transfer.

Page 4: XML Databases in BMI

CDSS-4

CSE 300

XML overview cont.XML overview cont.

There are no predefined data (tags), or grammer There are no predefined data (tags), or grammer inherently in XML.inherently in XML.

XML tags give an XML document structure and XML tags give an XML document structure and meaning.meaning.

Available tags are defined by a schema.Available tags are defined by a schema.All tags in an XML document come in pairs, All tags in an XML document come in pairs,

open and close.open and close.Tags are completely nested, and there is no Tags are completely nested, and there is no

ambiguity in their order.ambiguity in their order.

Page 5: XML Databases in BMI

CDSS-5

CSE 300

XML tagsXML tags

XML tags may have an element field which is used to XML tags may have an element field which is used to store information within the tag. Meta-data.store information within the tag. Meta-data.

Plain text can be placed between tags. This text is not Plain text can be placed between tags. This text is not parsed.parsed.

CDATA is character data. This means that any string CDATA is character data. This means that any string of non-markup characters is legal as part of the of non-markup characters is legal as part of the attribute.attribute.

The ENTITY attribute type indicates that the attribute The ENTITY attribute type indicates that the attribute will represent an external entity in the document itself.will represent an external entity in the document itself.

The ID attribute type if you want to specify a unique The ID attribute type if you want to specify a unique identifier for each element.identifier for each element.

Page 6: XML Databases in BMI

CDSS-6

CSE 300

XML SchemaXML Schema

The structure of an XML document is defined by The structure of an XML document is defined by its schema.its schema.

Dozens on languages to define XML schema:Dozens on languages to define XML schema: DTD W3C (XSD) NG - Relax

This file can validate any instance of an XML This file can validate any instance of an XML document against it self.document against it self.

This file, or schema also defines allowable tags.This file, or schema also defines allowable tags.

Page 7: XML Databases in BMI

CDSS-7

CSE 300

Schema Example (XSD)Schema Example (XSD)

<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="orderid" type="xs:string" use="required"/> </xs:complexType></xs:element></xs:schema>

Page 8: XML Databases in BMI

CDSS-8

CSE 300

XML StructureXML Structure

XML employees a tree structure model for XML employees a tree structure model for representing data. (previous slide)representing data. (previous slide)

shiporder

orderperson

shipto

name address city country

item

title name quantity price

orderid

Page 9: XML Databases in BMI

CDSS-9

CSE 300

Querying XML - XPathQuerying XML - XPath

Many languages to query XML. We'll focus on Many languages to query XML. We'll focus on XPath and XQuery as they are W3C standards.XPath and XQuery as they are W3C standards.

Xpath is a compact method of traversing Xpath is a compact method of traversing previous tree.previous tree.

Designed to facilitate use via URL/URI's.Designed to facilitate use via URL/URI's. /shiporder/item/name ← view all items' names

Extensible to add user defined behaviors.Extensible to add user defined behaviors.Treats each tag as a node in the tree.Treats each tag as a node in the tree.

Page 10: XML Databases in BMI

CDSS-10

CSE 300

Querying XML - XQueryQuerying XML - XQuery

Functional extension of XPath Functional extension of XPath XML equivalent of SQLXML equivalent of SQLNavigate and manipulate document nodes.Navigate and manipulate document nodes.Works on collections of documents, or even Works on collections of documents, or even

fragments.fragments.

FOR $b IN document("bib.xml")//bookWHERE $b/publisher = "Morgan Kaufmann"AND $b/year = "1998"RETURN $b/title

Page 11: XML Databases in BMI

CDSS-11

CSE 300

XML ModelsXML Models

Naively there are two models of XML use:Naively there are two models of XML use: Data-centric Document-centric

In reality, most XML use is a hybrid of the two.In reality, most XML use is a hybrid of the two.More important is the database strategy used More important is the database strategy used

with XML.with XML. Relational Content Managment Native XML

Page 12: XML Databases in BMI

CDSS-12

CSE 300

Data – centric modelData – centric model

Information is generally stored in a relational Information is generally stored in a relational database.database.

XML is transport medium, nothing more.XML is transport medium, nothing more. Irrelevent to application that data exists as Irrelevent to application that data exists as

XML for some period of time.XML for some period of time.Characteristics:Characteristics:

Fine grained data. Data relationship is insignificant. Need to transfer relational information. Means of storing new information.

Page 13: XML Databases in BMI

CDSS-13

CSE 300

Document – centric ModelDocument – centric Model

When XML is utilized soley as a document. When XML is utilized soley as a document. (This pesentation in Open Office).(This pesentation in Open Office).

The documents in part, or in full are stored and The documents in part, or in full are stored and retrived. retrived.

Does not originate from relational database.Does not originate from relational database.Document used for human consumption.Document used for human consumption.Usually information written by hand in a Usually information written by hand in a

language like PDF, RTF then converted to XML.language like PDF, RTF then converted to XML.

Page 14: XML Databases in BMI

CDSS-14

CSE 300

Reality: Hybrid ModelReality: Hybrid Model

Most documents like a PDF will also contain Most documents like a PDF will also contain small grained information (last edited date, small grained information (last edited date, character set).character set).

Data from a relational DB may even be a Data from a relational DB may even be a document, or require self description.document, or require self description.

Various database technologies support all Various database technologies support all models.models.

Important to understand your data, and choose Important to understand your data, and choose db technology that is most compatible.db technology that is most compatible.

Page 15: XML Databases in BMI

CDSS-15

CSE 300

Medical Data ModelMedical Data Model

Medical data is non-homogeneous.Medical data is non-homogeneous.But, there exists general trends in medical data:But, there exists general trends in medical data:

Fine grain data such as dates, times, images. Documents and human generated descriptions and

observations. Human interaction creates semi-structured data.

Ability to transfer information is esential. Ability to transfer information is esential. Medical data fits into hybrid model.Medical data fits into hybrid model.

Page 16: XML Databases in BMI

CDSS-16

CSE 300

Data – centric ComparisonData – centric Comparison

Advantages:Advantages: Utlizes existing database software. (IBM, Oracle, MS) Quick ( existing db's are already fast). Dual role (not limited only to XML). Many even support XQuery

Disadvantages:Disadvantages: More configuration (mapping relational -> XML). Slower when creating complex XML files due to middle step.

Page 17: XML Databases in BMI

CDSS-17

CSE 300

Document – entric ComparisonDocument – entric Comparison

Advantages:Advantages: Good integration into workflow. Document managment made easy. Collaboration, and web publishing.

Disadvantages:Disadvantages: Not able to extract data from document directly. Not designed for high availability, high load systems. Non-uniformity in implementations.

Page 18: XML Databases in BMI

CDSS-18

CSE 300

Storage Strategy: RelationalStorage Strategy: Relational

Utilizing a relational database to store XML Utilizing a relational database to store XML documents and data is very popular.documents and data is very popular.

In a very data – centric application this approach In a very data – centric application this approach is intuitive.is intuitive.

Most top tier database applications support XML Most top tier database applications support XML in some way.in some way. Oracle, SQL server, IBM, etc...

Software is highly supported and well developed.Software is highly supported and well developed.

Page 19: XML Databases in BMI

CDSS-19

CSE 300

XML Shema mappingXML Shema mapping

Using a relational DB requires mapping XML Using a relational DB requires mapping XML schema to DB schema.schema to DB schema.

Table based:Table based: Often implemented as a middleware layer. Schema structure must follow row-column

convention.Object – relational:Object – relational:

XML is a tree of objects. Mapped to DB using well established OR methods. Natively supported in some DB apps.

Page 20: XML Databases in BMI

CDSS-20

CSE 300

Storage Strategy: CMSStorage Strategy: CMS

Used in exclusively document-centric model.Used in exclusively document-centric model.Various programs allow indexing, storage, Various programs allow indexing, storage,

manipulation, and publication of XML manipulation, and publication of XML documents.documents.

Application specific.Application specific.Numerous implementations, most recently Numerous implementations, most recently

Open Office and MS Word 2007.Open Office and MS Word 2007.Not very interesting or useful in context of Not very interesting or useful in context of

biomedical information.biomedical information.

Page 21: XML Databases in BMI

CDSS-21

CSE 300

Storage Strategy: NativeStorage Strategy: Native

Semi – structured data.Semi – structured data. Mapping to relational DB causes inflation and null

space. Need more functionality and granularity than CMS

Performance increase over relational DB by Performance increase over relational DB by avoiding joins.avoiding joins. Assuming data is in appropriate order on disk.

Only returns XML, need to convert for non Only returns XML, need to convert for non XML manipulation.XML manipulation.

Development still in infancy as of Winter 2007.Development still in infancy as of Winter 2007.

Page 22: XML Databases in BMI

CDSS-22

CSE 300

Native XML DatabasesNative XML Databases

Definition:Definition: ”A database that has an XML document as its fundamental unit of

(logical) storage and defines a (logical) model for an XML document, as opposed to the data in that document, and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order.”

Data types: No support in XML, need a mapping.Data types: No support in XML, need a mapping. Document or database schema can be used. External user defined mapping. Not necessary when only transfering data.

No requirement on underlying medium or No requirement on underlying medium or implementation.implementation.

Two architectures; text and model based.Two architectures; text and model based.

Page 23: XML Databases in BMI

CDSS-23

CSE 300

Native: Text-basedNative: Text-based

Use any DB.Use any DB. Rather than mapping schemas, store entire XML documents.Rather than mapping schemas, store entire XML documents. Usually involves saving entire document as a BLOB / Usually involves saving entire document as a BLOB /

Character LOB.Character LOB. Utilize various text field searches to retrieve info from XML Utilize various text field searches to retrieve info from XML

document.document. Some DB text searching are being made XML aware.Some DB text searching are being made XML aware. Speed: Document located on disk preferences full or partial Speed: Document located on disk preferences full or partial

document retrieval.document retrieval.

Page 24: XML Databases in BMI

CDSS-24

CSE 300

Native: Model-basedNative: Model-based

Internal object model of the document schema.Internal object model of the document schema.Store this model in a database.Store this model in a database.

Relational / object-oriented database. Proprietary.

Performance similar to chosen db engine.Performance similar to chosen db engine.Still limited by hierachy of XML data.Still limited by hierachy of XML data.

Retrieve all orderid's from hundreds of docs slow.Support for common XML query languagesSupport for common XML query languages

XPath, XQuery, etc...

Page 25: XML Databases in BMI

CDSS-25

CSE 300

Native XML: TLCNative XML: TLC

In the traditional database world, Transactions, In the traditional database world, Transactions, locking and concurrency are paramount.locking and concurrency are paramount.

Native XML databases aren't mature enough to Native XML databases aren't mature enough to support everything.support everything.

Most support transactions, but what about LC?Most support transactions, but what about LC? Document level locking is easy, but too coarse. Only a few implementations support node level

locking.Commercial products generally support ACID, Commercial products generally support ACID,

free ones just starting too (2008).free ones just starting too (2008).

Page 26: XML Databases in BMI

CDSS-26

CSE 300

Native XML: API'sNative XML: API's

Ubiquity of ODBC interfaces.Ubiquity of ODBC interfaces. Still applies to native XML databases.

Most implementations provide their own Most implementations provide their own interface for a variety of languages.interface for a variety of languages.

Industry standardization:Industry standardization: XML:DB API from XML:DB.org, programming

language neutral. JSR 225: Xquery API for JAVA (XQJ). IBM and

Oracle.

Page 27: XML Databases in BMI

CDSS-27

CSE 300

Native XML: The RestNative XML: The Rest

Referential integrity is supported in an adhoc Referential integrity is supported in an adhoc manner at best.manner at best.

Database cannot enforce user defined (via Database cannot enforce user defined (via schema) integrity.schema) integrity. Some standard mechanisms allow it.

Eventually both mechanisms will be supported.Eventually both mechanisms will be supported.Currently relies heavily on application for Currently relies heavily on application for

normalization and integrity.normalization and integrity.Certainly a drawback for medical applications.Certainly a drawback for medical applications.

Page 28: XML Databases in BMI

CDSS-28

CSE 300

Native XML: ScalabilityNative XML: Scalability

Limitation of any DB is time spent seeking HD.Limitation of any DB is time spent seeking HD. XML only needs to find pointer to head of doc.XML only needs to find pointer to head of doc. Therefore an XML DB should scale well in the Therefore an XML DB should scale well in the

context of retrieving data.context of retrieving data. The only caviat is if the retrieval breaks the The only caviat is if the retrieval breaks the

document hierachy.document hierachy. More pointers must be followed, potentially More pointers must be followed, potentially

slowing retrieval greatly.slowing retrieval greatly. Where there is money, there is a way.Where there is money, there is a way.

Page 29: XML Databases in BMI

CDSS-29

CSE 300

Biomedical InformationBiomedical Information

Overview of the field.Overview of the field.Data storage and transfer problem.Data storage and transfer problem.XML as a solution.XML as a solution.BMI XML examples.BMI XML examples.Next section: Choosing a native DB.Next section: Choosing a native DB.

Page 30: XML Databases in BMI

CDSS-30

CSE 300

BMI OverviewBMI Overview

The convergence of computation and The convergence of computation and biomedicine.biomedicine.

The NIH BMI Science and Tech Initiative:The NIH BMI Science and Tech Initiative: Define biomedical computing as a science. Many sources of information:

Clinical, surgical, genetics, drug design, biology. Standardization in software. Algorithm development, high speed computing.

All relieves on efficient storage and transfer of All relieves on efficient storage and transfer of information.information.

Page 31: XML Databases in BMI

CDSS-31

CSE 300

BMISTI: DatabasesBMISTI: Databases

””Biomedical computing is entering an age where creative Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of exploration of huge amounts of data will lay the foundation of hypotheses.” ~NIH Directorhypotheses.” ~NIH Director

Problems:Problems: Standards. Terminology, syntax and semantics need to be defined

and agreed upon to allow integration of data. Curation. Database submissions need to be checked and cross-

referenced to avoid the transitive propagation of error. Interoperability. Data should be as consistent as possible across

databases so that researchers can compare and contrast it.

Computational and Systems issue:Computational and Systems issue:Utilize and manipulate information.Procress large volumes of information.

Page 32: XML Databases in BMI

CDSS-32

CSE 300

BMI: XMLBMI: XML

Data sharing and Data sharing and semantic interoperability.semantic interoperability.Case study: Electronic Health Record.Case study: Electronic Health Record.

The development and use of an integrated health record for a patient.

Hetergenous data, e.g. clinical, clinical-trial, genomic data.

Primary Obstacle: Proprietary data formats.Primary Obstacle: Proprietary data formats.Uniformity on technical level: Text file.Uniformity on technical level: Text file.Step towards semantic goal.Step towards semantic goal.

Page 33: XML Databases in BMI

CDSS-33

CSE 300

XML in Clinical DataXML in Clinical Data

HL7 standards organization.HL7 standards organization. V2: ASCII bar format. example:

HL7V3|1|2.02Message|2.16.840.1.113883.1122^CNTRL-3456|2002081614303516^- ---> 06:00||3.0|2.16.840.1.113883^POLB_IN004410||P|I|ER|ERrespondTo|RSP|tel:555-555-5555^^WP entit yRsp|||{FAM^^Hippocrates~GIV^^Harold~GIV^^H~SFX^AC^MD}|tel:555-555-5555^^WPsender|SND|nfs:127.127.127.255 device||2.16.840.1.113883.1122^GHH LAB|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H agencyFor representedOrganization||\NOTH\ location|||2.16.840.1.113883.1122^ELAB-3|{^^GHH Lab}^TNreceiver|RCV|nfs:127.127.127.0 device|||2.16.840.1.113883.1122^GHH O E|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H agencyFor representedOrganization|||2.16.840.1.113883.19.3.1001|{^^GHH Outpatient Clinic}^TN location|||2.16.840.1.113883.1122^BLDG4|{^^GHH Outpatient Clinic}^TN

Awkward, inflexible, unclear meaning of values.

Page 34: XML Databases in BMI

CDSS-34

CSE 300

HL7 V3 SpecificationHL7 V3 Specification

Built around Reference Information Model:Built around Reference Information Model: Entity, Role, Participation, and Act Utilizes dedicated vocabularites and data types. Every specification must begin from RIM.

Clinical Document ArchitectureClinical Document Architecture Utilizes XML with tags like ”observation, code,

value and id”.<observation classCode="OBS" moodCode="EVN"> <id root="10.23.4573.15879"/> <code code="313193002" codeSystem="2.16.840.1.113883.6.96" codeSystemName="SNOMED CT" displayName="Peak flow"/> <effectiveTime value="20000407"/> <value xsi:type="RTO_PQ_PQ"> <numerator value="260" unit="l"/> <denominator value="1" unit="min"/> </value></observation>

Page 35: XML Databases in BMI

CDSS-35

CSE 300

XML in Clinical TrialsXML in Clinical Trials

Example: Drug studiesExample: Drug studies Utilizing XML would eliminate manual transcription

when moving data from one system to another.XML is a universal datatype as it stores XML is a universal datatype as it stores

everything in text.everything in text. Therefore can handle new tech. seamlessly.

Clinical Data Interchange Standards Consortium.Clinical Data Interchange Standards Consortium. Industry standardization.

Page 36: XML Databases in BMI

CDSS-36

CSE 300

CDISC: ODMCDISC: ODM

Operational Data Model:Operational Data Model: XML based. Facilitate moving data from any collection system to

clinical trial sponsor. Addresses real world issues:

Incomplete dataPartial data transferVersioning and branching.

ODM 1.1 current version.ODM 1.1 current version.

Page 37: XML Databases in BMI

CDSS-37

CSE 300

ODM: LayoutODM: Layout

Page 38: XML Databases in BMI

CDSS-38

CSE 300

XML in Genomic DataXML in Genomic Data

Various groups export their data in XMLVarious groups export their data in XML NCBI, EBI

They do not follow same schema, only allows They do not follow same schema, only allows partial semantic interoperability.partial semantic interoperability.

Microarray Gene Experssion Group (MAGE) Microarray Gene Experssion Group (MAGE) publishes a schema.publishes a schema. MAGE files are often several gigabytes. Illustrates overhead of XML, however researches still

use it because of interoperability.

Page 39: XML Databases in BMI

CDSS-39

CSE 300

XML ComplexityXML Complexity

Clinical Genomics Special Interest Group (HL7)Clinical Genomics Special Interest Group (HL7) Use genomic data in clinical enviroment.

Utilize several models such as MAGE, BSML Utilize several models such as MAGE, BSML (for (for dna seqs)dna seqs)

All information in raw models not necessary.All information in raw models not necessary. ”Bubbling up” analyzes large raw data sets, extracts

useful information. Transfer useful information to new schema / model.

Bottom line, there exists complex workflows to Bottom line, there exists complex workflows to extract usable information.extract usable information.

Page 40: XML Databases in BMI

CDSS-40

CSE 300

XML BMI IssuesXML BMI Issues

Clinical information like a verbal description or Clinical information like a verbal description or advice is unstructured.advice is unstructured. How do you query this?

Schemas and Models are extremely complex, Schemas and Models are extremely complex, with nesting, recursion and compound data with nesting, recursion and compound data types.types. Difficult mapping to relational databases.

XML instances may be gigabytes in size.XML instances may be gigabytes in size. What database solutions exist to handle such large

files?

Page 41: XML Databases in BMI

CDSS-41

CSE 300

XML BMI ExamplesXML BMI Examples

A closer look at the Clinical Document Arch.A closer look at the Clinical Document Arch. Mayo clinic's implementation of CDA.

Case study using native XML database to Case study using native XML database to facilitate research based upon clinical texts.facilitate research based upon clinical texts. Tamino XML DB. Querying native BD.

UCONN BMI, CSE 300 Spring 2008UCONN BMI, CSE 300 Spring 2008

Page 42: XML Databases in BMI

CDSS-42

CSE 300

XML BMI: CDAXML BMI: CDAwww.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdfwww.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdfA clinical document is:A clinical document is:

Persistence: exists for a defined time period. Stewardship: Maintained by a designated care taker. Potential for authentication: May be legally

authenticated. It must be human readable on a standard web

browser. Utilizes standard XML syntax

Page 43: XML Databases in BMI

CDSS-43

CSE 300

XML BMI: CDAXML BMI: CDAwww.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdfwww.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdf

Mayo clinics use of CDA:Mayo clinics use of CDA:

Page 44: XML Databases in BMI

CDSS-44

CSE 300

A Native XML Database Design for Clinical Document ResearchA Native XML Database Design for Clinical Document ResearchJohnson, Campbell, et. alJohnson, Campbell, et. al

Facilitate research, especially research on clinical text.Facilitate research, especially research on clinical text. User needs to be accounted for:User needs to be accounted for:

Process queries against text. Process queries against annotations. Standard method for querying. Non-heirachical document selection (by patient, date,...) Return varying level of document granularity. A schema which adapts to new information without

breaking old query formulations. A schema which adapts to new annotations.

Page 45: XML Databases in BMI

CDSS-45

CSE 300

cont.cont.

Tamino XML DBMS: A commercial product.Tamino XML DBMS: A commercial product. Supports XQuery, text search which address many

of the querying needs.Utilizes the CDA for structuring meta-Utilizes the CDA for structuring meta-

information.information.A schema structures documents on sentance by A schema structures documents on sentance by

sentance level.sentance level. Allows high level of granularity.

Tags to link words to sementic and vocabulary Tags to link words to sementic and vocabulary library.library.

Page 46: XML Databases in BMI

CDSS-46

CSE 300

UCONN BMIUCONN BMI

Utilize a native XML DB to store docuemnts.Utilize a native XML DB to store docuemnts.Documents could be PHR, health data / statistics, Documents could be PHR, health data / statistics,

or system meta-data (registration).or system meta-data (registration).Our goal is to provide secure submission and Our goal is to provide secure submission and

retrieval of a variety of XML data.retrieval of a variety of XML data.For spring 2008, only focusing on submitting For spring 2008, only focusing on submitting

registration data.registration data.

Page 47: XML Databases in BMI

CDSS-47

CSE 300

UCONN BMI: OverviewUCONN BMI: Overview

User

Current state:

Browser: HTML Form Java Server

Create XML document

Submit to DB

HTML Java XML

Data exists in three different domains: It is in HTML, a text datatype when the user enters it. The server maps the html to java strings to create the

XML. The XML is written to a file on the server, and submitted

to the database via a java API.

Page 48: XML Databases in BMI

CDSS-48

CSE 300

UCONN BMI: ProblemsUCONN BMI: Problems

There are 2 transformations of data.There are 2 transformations of data. Each requires a hand coded mapping. This leads to sloppy code, wasted resources.

Only does XML as input, what about output?Only does XML as input, what about output?The database is obtuse (sedna), what other The database is obtuse (sedna), what other

options exists?options exists?Do we want to store / transmit application Do we want to store / transmit application

data?data?

Page 49: XML Databases in BMI

CDSS-49

CSE 300

UCONN BMI: Model (potential)UCONN BMI: Model (potential)

Utilize client side JS to create XML.Utilize client side JS to create XML. Use java API to manipulate XML.Use java API to manipulate XML. Problems:Problems:

Document verified through schema, and Xquery. Awkward to cross reference input with any other data.

Advantages: Advantages: No server side data type conversion. This model applies to user driven input and systems interactions.

UserBrowser: HTML Form Java Server

Submit to DB

HTML Java XML

js -> XML XQuerySystem

Page 50: XML Databases in BMI

CDSS-50

CSE 300

UCONN BMI: Model retrievalUCONN BMI: Model retrieval

Client queries in XQuery or predefined query in server.Client queries in XQuery or predefined query in server. Server uses API to execute XQuery to DB.Server uses API to execute XQuery to DB. Java Server is given XML document, it can:Java Server is given XML document, it can:

Apply java based XSLT and return to requestor. (more reliable) Return raw document, client side JS applies XSLT. (less server

load) Both

User / System

Java Server DB

HTML

Java

XML

XQueryQuery

XSLT

JS

Page 51: XML Databases in BMI

CDSS-51

CSE 300

UCONN BMI: Retrival ProblemsUCONN BMI: Retrival Problems

There is still no method of performing business There is still no method of performing business logic outside the scope of XSLT or XQuery. logic outside the scope of XSLT or XQuery.

What types of data should be retrieved in What types of data should be retrieved in XML:XML: Data that does not require complex logic, like login

credential validation, or registration. Health records and data which follow a defined

schema. Education, treatment, and research information

which follow a defined schema.

Page 52: XML Databases in BMI

CDSS-52

CSE 300

UCONN BMI: XML FutureUCONN BMI: XML Future

Focus implementing XML features on the Focus implementing XML features on the appropriate data.appropriate data.

Choose an XML database which offers high Choose an XML database which offers high reliability, and ease of use.reliability, and ease of use.

Develope XSLT templates for transforming Develope XSLT templates for transforming XML data to appropriate format.XML data to appropriate format.

Page 53: XML Databases in BMI

CDSS-53

CSE 300

Survey of Native XML DBMSSurvey of Native XML DBMS

Comprehensive List:Comprehensive List: http://www.rpbourret.com/xml/

XMLDatabaseProds.htm#nativeCommercial:Commercial:

Tamino XML Server.Well developed, supported, many tools available.

Open Source:Open Source: Sedna: Fully supports ACID, XQuery. eXist: Great managment, documentation, indexing.

Page 54: XML Databases in BMI

CDSS-54

CSE 300

eXisteXist

http://www.rpbourret.com/xml/ProdsNative.htm#exist

Proprietary data store B+ trees).Proprietary data store B+ trees). Supports XQuery/XPath 2.0Supports XQuery/XPath 2.0 Full text searches.Full text searches. XML:DB API.XML:DB API. Document level concurrency.Document level concurrency. Complete documentation.Complete documentation. Incomplete transaction support.Incomplete transaction support.

Page 55: XML Databases in BMI

CDSS-55

CSE 300

SednaSedna

Underlying data storage based on DataGuideUnderlying data storage based on DataGuide Supports XQuery/XPath 2.0Supports XQuery/XPath 2.0 Full text searches.Full text searches. Custom API for various languages.Custom API for various languages. Command line admin.Command line admin. Transaction support.Transaction support.

http://www.rpbourret.com/xml/ProdsNative.htm#sedna

Page 56: XML Databases in BMI

CDSS-56

CSE 300

Questions?Questions?

Thank you.Thank you.

Page 57: XML Databases in BMI

CDSS-57

CSE 300

ReferencesReferences

““Canonical XML Version 1.0”, John Boyer. 15 March 2001. Canonical XML Version 1.0”, John Boyer. 15 March 2001. W3CW3C

““XML Path Language (Xpath) 2.0”. W3C working Draft. 2 XML Path Language (Xpath) 2.0”. W3C working Draft. 2 May 2003. W3CMay 2003. W3C

““XML Schema”. XML Schema Working Group. 1 January XML Schema”. XML Schema Working Group. 1 January 2008. W3C2008. W3C

<http://www.w3.org/XML/Schema> <http://www.w3.org/XML/Schema>  ““XML Schema: Formal Description” Brown, Fuchs, et. al. 25 XML Schema: Formal Description” Brown, Fuchs, et. al. 25

September 2001. W3CSeptember 2001. W3C <http://www.w3.org/TR/xmlschema-formal/><http://www.w3.org/TR/xmlschema-formal/> ““Extensible Markup Language (XML)”. 1 January 2008. W3CExtensible Markup Language (XML)”. 1 January 2008. W3C <http://www.w3.org/XML/><http://www.w3.org/XML/> http://www.25hoursaday.com/StoringAndQueryingXML.htmlhttp://www.25hoursaday.com/StoringAndQueryingXML.html http://www.nih.gov/about/director/060399.htmhttp://www.nih.gov/about/director/060399.htm http://www.research.ibm.com/journal/sj/452/shabo.htmlhttp://www.research.ibm.com/journal/sj/452/shabo.html ““Overview of the CDISC Operational Data Model”. 26 April Overview of the CDISC Operational Data Model”. 26 April

2002. CDISC2002. CDISC