informatics.mayo.eduinformatics.mayo.edu/sharp/images/f/fc/...day_one.docxweb viewhigh performance...

SHARPn Face to Face June 30 – July 1, 2011University of Minnesota Rochester Campus

DAY ONEDr. Christopher Chute welcomed everyone. Choose to have meeting at University of Minnesota Rochester campus as UMN and Mayo have deep connections.

Vice chancellor of University of Minnesota Rochester, Jon Hesley, welcomed the group to UMR campus.

Introduction and Thank you to Chuck Friedman – Dr. ChuteDr. Chute thanked Chuck Friedman as he will be leaving ONC. He has shown tremendous support in Beacon and Sharp and made these accomplishments in health IT possible.

SHARP Program Overview & Vision – Chuck Friedman Nov 17, 2009 – ONC staff called into meeting. There was a very good idea that most, if not all, of the money under the HITECH Act were obligated by end of March 2010. ONC staff worked to define many programs, one of which was SHARP. This meeting is an expression of a key feature instilled in this program – four funded sites profoundly collaborative. There are huge opportunities for connection and collaboration amongst these four sites. All four projects have significant momentum, speed, and direction.A year ago, Dr. Julian Goldman was added to SHARP family as the 5th member as a program affiliate. Dr. Goldman leads the NIH funded Quantum grant program – breakthrough device inoperability and safety, thus offered an affiliate member as SHARP program Also take time to acknowledge Will Yu – SHARP program officer that has done a phenomenal job.July 11-12 will be the second annual SHARPfest in Washington, DC. All five SHARP leadership get together to:

Catch up across all five sites To interact with federal advisory program and work groups Refine plans and identify new opportunities Discuss issues that may have come up. For example, SHARP 3 / Josh Mandel from Harvard

identified that there will be a need for patient data to use as testing and development. Need data that is population represented / valid samples. This topic will be pursued at SHARPfest.

Future of Health IT research – the new national coordinator is very supportive. Chuck is very optimistic for funding in future. Life after sharp – we have support at ONC and at the White House.Another federal organization, NITRD (est. 1991), came from HITECH Act. High performance computing act to coordinate networking and IT research development across the government to ensure important issues are addressed and to minimize redundant efforts. HITECH Act says that NITRD shall have a program in Health IT (application domain). A senior steering group has been formed with many members across multiple government agencies. ONC is an official member of NITRD. The steering group has been charged with planning coherent program coordinated across federal agencies for Health IT research and development.

SHARP Area 4 Overview & Vision – Dr. ChuteThanked the team for the huge accomplishment reaching the milestones thus far to make this meeting happen. Dr. Chute looks forward to establishing and reaching the next goals.For SHARPn we really have two fundamental activities - notion of normalization and notion of phenotyping. Within the normalization notion we have syntactic and semantic normalization. Consider NLP normalization as well. Deeply indebted to Stan Huff for sharing the 30 years of work that him and his team have put into CEMs and infrastructure. HUGE thank you.Highly specific models / highly general models which raises collaboration with SHARP 3

API / tools / smart platformsPhenotyping logic – ‘cheating’ by using SQL database – however, we need to use smart access

API logic in futureSecurity framework will create collaboration with SHARPs – to ensure world class data security.How do we manipulate / visualize this data / usability of tools we generate?What makes this relevant? To what extent will this influence healthcare?

Ample opportunity to demonstrate this in real world – close relationship with SE MN Beacon, Utah Beacon, Indiana Beacon, and Indiana Health Exchange. Making sure it works – using these collaborations to successfully engineer units.Dr. Chute introduced Ross Martin from Deloitte.

Ross Martin – will be the facilitator for meeting. Reviewed housekeeping items.Tomorrow there will be a breakfast at 7:30 for executive committee.

Proof of concept – Calvin Beebe and Les WestbergTracer bullet – following the process end to end with a small subset of functionality to prove and verify that all functions work.Demonstrate we have new tools we are putting together, and new technologies working together that never have before (i.e. MIRTH). 4000 information models were put together by Stan and team. Developed various methods on how to transform those / how to persist thoseGet information from point A to point B (using open source technologies)Using NwHIN exchange and reference implementations of those Choose 3 of those 4000 models for this prototype (HL7 2.x to CEM)

Standard lab – medication - admin diagnosisUsing NLP to create those normalized modelsPersisting this – using SQL for this prototypePhenotyping process across the CEM databaseTake that message in HL7 through the internet and through UIMA pipeline to transform and normalize, persist it, and use for phenotyping.

High level picture:

Started with data from IHC (Intermountain Healthcare)Run through MIRTH connect (data in one format and transform/move to another format)NwHIN document submission (firewall on both sides)Cloud that has been set up for SHARP - NwHIN gateway – then back to MIRTH (work down to move from UIMA back to MIRTH)Persist the dataPhenotyping

Will discuss the work that was done to secure / certificates in place

Why did we choose MIRTH connect?Done linkages to exchange connectProvided functionality that made senseProvide connectors – don’t want to constrain to one model/one formatOpen sourceThey had done connectivity for document retrieveStore CEM instances to the database and using MIRTH to do that right nowRouting

Calvin Beebe - Pipelines established to process this dataFlows of data from existing sourcesWhy pull out of your enterprise data trust – we want to do phenotyping – need flow that contains the three years of patient data - realistic volumes of data on the right patientWorkflows developed at IHC and Mayo

Configurable UIMA pipelineCreated tabular data modules for admin diagnosis processor

NLP based solutionDid create a new modulator annotator flowTook cTAKES solution and added more processing capabilitiesRan 10,000 patients which generated 360,000 CEMS (Mayo)Over 3.4 million instances of CEM for medication (3 yrs of data)

Process HL7 records and labsLab messages and running thru pipeline to create CEMCall upon a vocabulary service interactively and NLP

Les WestbergLightway database – when you want a production DB you need lots of index for the data

Patient identifier informationHandful of demographicsKeep indexes – extract some data out of instance of the modelAdditional info – such as source informationAgain, can be something other than SQL

Phenotyping activityUsed Drools environment

Conceptual flow – transformation layer – inference workflow – service for creating output – listBusiness logic

Completed workCloud system set upTools installed on cloudInstall and configure instances on IHC and Mayo side to speak to each otherTracer message all the way thru cloud to other side30 de-identified patients on IHC side – 134,000 CEMs instances10,000 patients – over 15 million CEMs generated

Meds labs billingModels can generate multiple representationsThree schemas – did not contain demographic data from IHCMirth enhancements – worked closely with MIRTH to do the following

Implemented NwHIN XDR connector capability – push directly from MIRTH to the pipelineCreate and work on adaptor – make final push from gateway to MirthCreated 3 persistent channels

Dual security certificate exchangeNwHIN security requires several layers of security

Two-way TLS (both sides) – had to get approval this was secure enoughNwHIN has local policy Get certificates created (two options)

from NwHIN certificate authority self-exchange certificate – more closed version

Large team effort – thank you all!

Questions – Choose to do all three for tracer message – not just medication

Really driven by the Phenotyping use case – if you want to do diabetes – you need lab, medication, etc.

On lab panels – one observation per panel / normally Chuck Friedman: Jump couple of levels of abstraction – problems with this pipeline?

Calvin Beebe: Silos in healthcare – individual codingPatient movesCan clean data and standardize

Chris Chute: If meaningful use version 47More than normalize dataOnce you have comparable – what do you want to do with itPut information into comparable / consistent existing formatNeed to define coherent buckets of patient types

Les Westberg: Internet in general – example of finding internet – clamoring for information – sift through data as well. Use to look at vocabulary for computability – now need vocab and information model.

Josh: what does medication representation with that many? 3 years of data, refills, etc.Referring to the architecture diagram, which box did the 30-some patients run through?

Calvin Beebe: 1-10 / target shot. Logically set came thru 6Larger set came in through 6aIntent is to process multiple sets at the same time

Les: Black box each one of these systems – isolate it - places we can insert messages in each – gives the modularity that you’d wantRam Sriram – What is the end goal?

Chris Chute – Hopefully by end of the day we’ll answer your question. SHARP’s deliverable as he sees it - library of open source software that can work in coherent framework. NLP modules / components. How would all this work in concert?Ultimate deliverables are twofold:

1. Body of tools to take information and put in CEMs – persistence layer of normalized data. Useable widgets.

2. Generate phenotyping types to create executable algorithms – tool kit for people to tailor their own data. Executable algorithms.

Ross Martin – certification of EMRs – presumes bar will be set higher over time. What examples electronic health record vendor can do to take part?Calvin Beebe - Small partners in Beacon / other small health care clinics that need tools to share data – they can leverage the same tools. Les Westberg- Don’t care to build models – wants to consume the mode. Need something to run consistent API. Wendy Chapman - Normalized data from IHC and Mayo – vision of where the data will be stored and available to researchers?Chris Chute – As Chuck alluded to this morning, SHARP needs to be able to generate data that can be used. NLP team finding ways to de-identify data to make it public available. Intention of SHARP was not to make large data available to health community. Instead, the goal is to enable academic research institutions to make their own data and achieve the kind of data normalization persistence layers that can sustain instance query. Enable people to generate consistent and comparable data. National federated data query. Come up with a very national based question. If every healthcare in country can have access to sharp tools (cloud, technologies) – can contribute to cell nominators and denominators. Answer the question with power. To make that federated query model to work, you need to have questions posed same way against comparable and consistent data. Those two anchoring points is what SHARP is focusing on.Ross Martin – because of the data use and sharing issues - keeping consistent data internally – in anticipation of federated goal.Chuck Friedman – Wanted to also mention this morning that the connection between SHARPn and other the other SHARP’s – and what’s going on in the national program – two important connections to remember

1. Standards and inoperability framework – Doug Fridsma2. Federated real data integration / digital infrastructure of what a rapid learning health

system

Chris Chute introduced Will Yu – superb program office and encouraged everyone to meet him.Will Yu presented a certificate of recognition to Lacey Hart for outstanding support to this grant program (one of two recipients for first half of 2011). Congratulations, Lacey!

IntroductionsSee attendee list and Sue Bakken – Columbia University (on phone)

Break

Ross Martin– welcomed all back from break. Pete Svoltis recommended all to update Wiki profiles. Registration table outside will have staff to assist with this including picture taking. This will help people connect.Pulse of the room – measure your absorption / understanding of this information. Hope that the meter improves by end of our conference.

Data Normalization – CEM background – Stan HuffCEM / information model – Stan’s piece.Detailed clinical model

Model that contains the detailed information we collect in medicine. Making connection between standard names and code.Connection for standard names for things to structure of informationNames, data types, association between, and structure of relationship

Why do we need this?SNOMED codes – (coding system)

Example: Numbness, right, arm, left letNeed organization in these elements (The numbness is in my right leg vs. left arm)Example: Manual, automated, estimated hematocrit

HL7 v2.x - a single name/code and value vs. two names/codes and valuesPre-coordinated vs. post-coordinated representationLOINC code in XML

Iso-semantic modelsAll in one code vs. two in one

Reason for these models - if I have a phenotype that I want to define, I’ll need to choose a modelExample: if the patient’s hematocrit is <=35, then….Unless you have a logical model, Can’t share computable rules and can’t execute the same queryCEMs make the data computable and shareable.

What do we model? 4500 on our website – 90% of coded data is at IHCAll health care – everything we can say about patients – population data – include financial

patient data.

How are the models used? EMR – help set up data entry screens, flow sheets, reports; data normalization

Example: all my hematocrits to conform to that second model – choosing one form as normalized form;

output of NLP in a structured and coded wayphenotype algorithms / decision logic

Note – this does NOT dictate the physical storage strategyFormal language used for expressing the model (CDL)Compiler – have clinical element source file – then have translator/compiler. Have tools to go to logical model to artifacts to implement a system. Create xml, java class, html, uml , HL7, owl, etc – automate regenerateArtifacts used -= CDL model definition then CEM XML schema, HL7 data source – represent in this model – end up with an xml instance from thatInstance data record compared to XML schemaOver the time and will discuss issues tomorrow.

Chris Chute – CEM, what to do with them and how to create Add medications and laboratories data – two UIMA pipeline for this demoStart with an HL7 message, initialize it into CAS (common analysis structure), then parsed into components within the UIMA pipeline, then through a normalization and then transformed back into CEM.One of the UIMA pipeline – converted HL7 V2.x lab – messages into CEM XML instance

Within annotators – normalization anatomy lab annotatorsArchitectural opportunitiesUIMA has branching logic – could do a case statement

We have a kind of payload – now where do we send itUnrap these information packets into CEM format

Staying away from downstream normalizationCEMs are not proprietary to IHC – working to harmonizing with other international variances. Populated over 4000 libraries and available via Google or Lacey stated URL is on our WikiLes Westberg - PersistenceUsed MIRTH to do the persistence – channel created for each datatype/model Take entire xml instance Need to post XML schema on wiki – Les to do General channel design

Drop xml instance into directoryPicks up and processes in persistence storeDrop in XML instances into directory – if not successful errors out in error directoryPersistence store (SQL for this)General demographics that we need for database modelEach CEM message that would contain the demographics along with it

Each instance would look at demographics to see if we had that patient before – normalizing the patients to a single identifier within the DB

Source data – intended to be the original source data – right now creating the entries after it’s been thru UIMA pipeline

After UIMA – each instance get one row in table called payload.Medications – orderable item is what we keyed onDatabase tables – Demographics, patient cross reference, source data, patient data, index data

tableOnly the index data table has multiple rows per message

Running in a cloud – CalvinUbuntu cloud – 240 cores – - 240 core infrastructure - open source linux system

Then installed these open sources: NWHIN gateway, MIRTH Connect interface engine, UIMA pipelines, MYSQL database, JBOSS/Drools rules engine

SHARP Hardware infrastructure slideData normalization discussion summary

Worked well for a tracer bullet – now will focus on new problems and solutionsChallenge will be to look at new opportunities

New annotators to the pipelinesWiden vocab servicesNeed to look at switching EDT environment to real live flows and add HOSS for

capability they haveQuestions: Ram Sriram: What is the course of action between CEM and HL7 and version 8Stan: the RIM is a very abstract model. CEM describe details of information. From the CEMs you can generate RIM or convert to instances – they are compatible in that way. RIM is not easy to use syntax to represent sharing of these models. If people have ongoing version three message exchanges, we’d be able to support that.Pete S: Number of CEMs instances in the test datasets. If you are trying to move phenotyping – we don’t want you to have phenotyping algorithms to see 300 medication CEMs. Some abstraction level missing between the individual instances. What you’d really like to see at the Symantec level is patient takes this medication at this time and this much and it decreased at this point because of the side effects. Who is responsible for designing that exact fit?Chris Chute – Jyoti and his team. How do we prioritize all the opportunities is very important as well.Pete S. - I was trying to point out there is a need to consider construction of those abstractions are we responsible or is the user?Stan – We can possibly handle the very common ones. Have all the instance data at the starting point. We should do them – some are common enough abstractions to incorporate in the infrastructure. There will be some that are very purpose specific.Marshall – how do you manage changing data (i.e. blood pressure up one day due to environmental situation)Stan - One attribute and break it into two pieces. Family models – association between old and new. User select to run data or not. Consortium (IHC repository) Formal process of editoral team, board to

decide how things should be done. Get value of having shared model, however not limited to customizing the models. Clinical Natural Language Processing Part 1 – GuerganaTwo live demos

Extracting data – live demoOne on ipad – share with audience on cTAKES – enter your textTechnical demo live from James Masanz

Guergana acknowledged all the investigatorsPart I - Overview

Background / objectives / year 1 achievements / ctAKES / year 2 goals / cTAKES demoAims for NLP team – extract all clinical relevant information from free text and convert to

normalized formClinical events, relations of events, populating template of these eventsBuild a high-throughput phenotype general purpose system - not case specific

clinical text, find information nuggets, cTAKES discovers bits of notes and highlights, translate text into CEM

Comparative effectiveness – compare the effectiveness against other dataMeaningful use – who can use these templates and CEMsClinical practice – another use case Applications – annotation of clinical text can open doors extensively

Meaningful use of the EMRYear 1 Accomplishments

Developed gold standard corpus technology methodologyCreate de-identification tools that enable sharing dataGenerating the corpus (seed corpus / degenerate the corpus)Annotation schema, guidelines, pilotGold standard annotations usedType system for software development (every contributor conforms to this type system)Development of evaluation workbench (common evaluation of methods used by all)Deliverables – dependency parser (9/2010) – enables relation extraction; drug profile module

(12/2010), smoking status classifier (3/2011), CEM medication (5/2011), Full-cycle pipeline v1 (6/2011)

Needed common network cloud to work inSecurity roundtable for cloud-deployed cNLP to handle

David Carrell working on white paper with guidelines from this roundtable outcome

SHARP Collaborations – SHARP 1 around security in a cloud; SHARP 3 extraction of data from clinical nugget to consume. Other partnerships – i2b2, VA in Boston and Salt Lake City, other R01 funded projects, emerge, and PGRNcTAKES Overview

JAMIA manuscript has all detail in manuscript (12/2010)Generic general purpose tool for extraction of information from a clinical narrative

Need Apache v2.0 license and then one can download cTAKES from source page (released almost two years ago – 2300 downloads thus far)cTAKES is being used in other programs – Mayo Clinic, i2b2 cell integration, eMERGE, PGRN, extensions created in Yale (YTEX), and MITRE

Technical details of cTAKESUIMA frameworkOpen source (apache license, java 1.5)Natural language methodsHigh-throughput system

Reviewed cTAKES components – see slide for detailsModules are trained on clinical dataConversion to CEMs

CAS transform to CEM (transform the representation in the CAS – use Freemarker to translate CAS into CEM) Very modular and very flexible

Year 2 and ForwardGoal: Clinically relevant events, connections between the events, then normalizing to the CEMsCo-reference problem – “his”Capture the uncertainty of predicting the future

Proposed deliverables for Y2 – see slide for details and timelineBig hurdle is IRB/DUA agreementsNeed to have cTAKES deployed in cloud environmentCollaboration with other SHARPS

Graphical User Interface (GUI) to cTAKES: a prototype (Pei) and ipad being passed aroundTechnology used was existing (using open source)

Prototype considerationsDeployment model (cloud), Security, performance, license

Marshall: How do you handle perfection? NLP approximate (not exact) handle that issueLevels of confidence / level that you enter the notes

Kent: Recognize ASA? No, since it’s not in RxNorm dictionary

Right now just focusing on what’s in the text and not any inferences – could be a SHARP function beyond NLP

NLP – James MasanzSee slides for list of components / description of code linesHigh level of what GUI looks likeUpdate gold standard corpus at Mayo with new models to match Wall Street corpus.Clinical data is so specialized – train on mixture of data – wall street journal text – argue that clinic language is its own. Was trace an entity or modifier – is ‘trace’ keystones in dictionary or not

High-throughput Phenotyping (HTP) – Jyoti

Acknowledge all that made this possibleClinical phenotyping is more difficult that GWAS – cost and functionPlatform for performing population studies (biobank/bio-repositories for example)How good are the essential EMRs data for phenotyping – many issues

Coding practicesNon-standarized and across multiple health care facilities Measured vs. non-measured population differences

EMR-derived phenotyping~15 phenotyping algorithms has been developed for eMERGE

PPV high expectations – 95-98%Three main threads of HTP projects

1. Identification of CEMs (Cui Tao)Three algorithms in use case

Type 2 diabetes mellitus (T2DM)Peripheral arterial disease (PAD)Hypothyroidism

Specify computed mappings between CEMs and algorithmsClassify into two categories

General (consistent data across all) and phenotype-specific HER dataSemantic classification types - See slides for detailsGeneral models for scalabilityMapping Issues (Susan Welch)

Maintain a process of record issues – for future referenceAvailability of data is sometimes an issue (age of onset with specific disease)Other mapping considerations – computable manner from a clinical element input over to the input that is specified by the secondary use algorithms. We are starting to identify the patterns that we see in these mappings (extractions/abstractions). Native, generalized, computed/transformed, or selected content.

Common constraints:Source of data, allowable codes, temporal bounds, relationships in separate observationsMake a mapping of what it looks like and make it as general as we can

CEM using OWL (Cui Tao)Three layer architecture – meta-ontology; detailed CEMs; patient instancesManuscript just accepted in AMIA – owl

2. Phenotyping execution logic (Jyoti)what are existing technologies that we can leverage (specify inclusion/exclusion criteria)

a. Drools-based phenotyping architectureb. Phenotyping is at end of pipeline cycle – b/c we are a user of the datac. Implementing / representing

d. Flow of logic thus far implemented – populate CEM database, data access layer and transformation layer into business logic for inference engine (drools)

e. Done so far on diabetes - see slidesf. Looking at GELLO expression / QDM measure criteria

3. Data quality, validation, cost effectivenessa. Now that phenotyping algorithm in place – let’s do some qualitative b. Centerphase project – comparative analysis. Manual (study coordinator review

record) pipeline vs. algorithm pipeline and tracked time, cost, accuracy, resultsYear 2 objective

National library of clinical phenotyping algorithmsLeverage work with Drools – develop web based phenotyping algorithms

Machine learning and phenotypingJust-in-time phenotyping

Can we apply this so that as the data comes in it is done rather than complete work on data that is sitting there

Phenotyping workbenchExperts to collaborate and possibly reuse / achieve plug & play workbench

Drools deep dive – Jeff Ferraro, Peter Haug (see slides for details)Peter Haug introduction

Herman Post studied Drools the most, however not possible to attendAsk questions as we go along – Jeff will run the slides and we have Darin Wilcox on phone as well.

Jeff FerraroReviewed outline of presentation / discussion points – see slides for details

Drools framework Business logic integration platform

Drools history – has the documentation along with this open sourceSee slides for details

Benefits of a rules engineReally interested in what we have – not howAbility to share knowledge / rules across organizations

Inference engine within droolsClinicians working with business analysts working with programmersCan allow human interventionVery expensive operation to establish knowledge base – only need to do onceWhat rules will be fired and what orderInvoke fire commandDesign architecture rules to ensure no infinite loop

Basic rule semanticsRule has a title/name that user specifiesHave a condition which has a number of facts or can retrieve data from external sources

We have decided to no not allow sourcing of data done without Drools – we should have outside process to share the data to ensure we don’t lose share-abilityIf rule is true – then consequence firesData Access layer within architecture – if we could set up a few fact models then rules (knowledge) can be shared. Transformation layer could take those CEMs and flatten those rules

Diabetes workflow – Darin Wilcox via phone / Herman PostAny change in workflow changes the underlining code (see slide for graphic)Lessons learned –

Rules vs. workflow and what is the right balance to usereuse data – use as knowledge base

Brings collaborators togetherIntegrate rules and workflows – doesn’t have to be one or the other

Future directionsDefine a fact model for all to useDomain specific language – what one meant word to be may not match another’s opinionSee slides

Collaboration group to share experiences with drools (Drools knitting group)Peter S. - Machine learning approaches – software sharing modules

Traditional inferencing modelsThis framework does support plug insTend to like machine based better

Chuck – what are the similarities to phenotyping – the current headlining topic?Way of defining something you define a populationAny rule you write against a person

Peter has EDT – see if we can configure drools to run rules against raw data (rules to create registries)

Just use SQL query instead of these rules?Maintains state over time – as data comes in for a patientAsynchronous event over timeMake more transparent

Have you found domain experts can take VPN and model against it with minimal training?Personally, skeptical that one day doctors will be offering/writing rulesCreate opportunity for knowledge engineers that work with cliniciansCapture knowledge that we can reuse over time is overarching goal

Data quality – Type II Diabetes – Kent baileyCan use some thoughts on what to do with the comparability studies dataAt what point do you audit raw data - Initially or after?Methods of collecting data

In two years – collect all relevant data / modifiers fit parameters

Analyze the differences between institutions – codes, demographic data, specialty, geographic, frequency of doctor visits / lab values

What can we learn doing specific algorithms?Hypothesis generation idea

Diabetes – ownSmoking – used cTAKES

Data does not look right – so need to recheck this automated pullIntermountain pullMayo pull – limited to Olmsted and surrounding counties (14,000 cases)Future directions

Inter-institution comparison, study effects for various categories that may conflict, implement chart review to obtain gold standard of T2DM, use of lab values/meds for definition of phenotype, generalization to other diseases for widgets, etc.

Data Quality – Data comparison discussion / machine learning methods to discover cohorts – Susan WelchIdea of machine learning – proposal to you for one potential usageUse knowledge base to develop rules that used attributes consist with the algorithms for diabetes / asthmaContribute new knowledge that could be used for cohort identificationConsistent and novel rules for Diabetes

Elevated lab glucose level – however, not as accurate when measure by glucometer alone (just the fact it was ordered alone was more predictive). Meds very predictive – some medication classifications were more predictive than others (some paired up rather than solo)

Consistent and novel rules for AsthmaMedications again (specific kinds had different levels of prediction as with diabetes)Meds combined with female gender was high rule

Test against gold standard set of patients that were defined by a group of expert cliniciansNo inferencing was used – straight forward machine learning Advantages: Did not require any domain expertise, not affected by missing data, proven accuracy, understandable and independent rulesMoving forward – data quality work – we’ve been told what data is for eMERGE, thus will run association methodology for each institution. Using eMERGE algorithm. Explore differences in clusters of associated information and understand why. Improve algorithms.When using ICD09 codes, need to be careful when the provider enters the code just to get the test done – even for the ruling out of a disease the code may be present in EMRChuck: Will be important to expand population geographically – currently have a collaborator in Texas on board – community based

Data Quality - Centerphase Study – Jeff TarlowValidate of using phenotyping algorithms with real world situationsBackground on company and relationship with Mayo

Started collaboration with Mayo in 2010 centered on Enterprise data trust.Take technology and provide a series of opportunities to improve healthcareInitially have focused on clinical trialsPerform cost effectiveness analysis

ApproachChoose a use case (T2DM)Create phenotyping methodology (flowchart)Generate random sample based on ICD09 codesCompare machine and manualAnalysis

Develop phenotype algorithm (eMERGE and Beacon) vs. manual – compare cost, time, and accuracy of these results

Two use cases – care management and clinical trialeMERGE for algorithm and Beacon for categorizing patient risk (used their criteria)Process has three screenouts (age, medications, labs & vitals)Using EDT and 11 counties – ran ICD09 queryStudy coordinator (manual process) vs. algorithm-driven processPhenotyping (based on SAS and SQL queries)Had validation and evaluation process

Began with a dry run of 20 charts first to identify issues in both processesRefined features based on those issues identifiedThen go live with 500 charts

Start with 50 to extrapolate to the 500 for today’s purposeSee Initial results slide for details

Algorithms are most effective in searching for large number of patients. Larger the number you are searching from the more algorithms is beneficial.

Ultimately using EDT / normalizationPhenotyping team will be a consumer of what NLP and team comes up with.

End of day one.

informatics.mayo.eduinformatics.mayo.edu/sharp/images/f/fc/...day_one.docxweb viewhigh performance...

Documents