share

Simon Bishop‘s slides for ESUG TC on SHARE on 22-March-2011

SHARE

ESUG Teleconference on 22-Mar-2011


What am I going to cover?• GSK’s current approach to standards and the need for change• Our plans / ongoing work and the similarity to CDISC SHARE• Information Model Technicalities (will probably skip)• Making the information model real …• SHARE content versus GSK content• What do you have to do in order to gain maximal benefits from

share?• Flexibility in practice• Creating an eCRF• Slide pack on BRIDG and ISO21090 [included in the slide pack

but will not be covered]

GSK’S CURRENT APPROACH TO STANDARDS AND THE NEED FOR CHANGE


What is the current GSK approach?• Current approach to standards is based on standard dataset definitions which

combine terminology, rules and structure• The standards processes are managed through a Lotus Notes database solution and

are made available to teams through multiple electronic solutions (an in-house Dataset Manager tool, a an in-house study specification tool, InForm libraries etc)

• Standards are available at both the global (core – all therapy areas) and therapy area levels. Some standards have been defined at indication level within the therapy area standards

• We align standard objects (CRF’s, data extraction programs, statistical displays, algorithms etc) to standard dataset definitions – a general rule is one eCRF module/page per dataset

• Lots of documentation, but not integrated with the standards• Study teams are required to apply for changes or exemptions when they need to do

something different for captured data


Some of our current issues• GSK standards, based on SDS 2.1 (the predecessor to SDTM) have limitations

– duplicate variables and datasets– ambiguity (what is this?; how am I meant to use this?)– different datasets employ different structures … hard to become familiar– Data Management and Stats want different data structures in order to do their work– little opportunity for automation– hard to aggregate and reuse data other than the core standards (AEs, labs, vitals etc)

• Lots of problems mapping our standards to SDTM– extra variables which don’t fit the domains– multiple different uses for an individual variable (some subtle differences but others not so

subtle)• SDTM seen as an add-on deliverable … we don’t want to build our standards

and tools around it– not an operational standard– doesn’t fit with our current complex toolset– doesn’t seem to fit with ADaM or our reporting process/macros– doesn’t do much to help with data aggregation

• Standards too tied to our toolset– hard to automate across the study process– painful whenever a tool is replaced


Drivers for change• Regulatory requirements for clinical data are changing

– new FDA requirements (i.e. CDISC) on their way– uncertainty about the future (e.g. HL7 v3)

• Need to be able to share data more easily with development partners • We need more flexibility in using standards (from the study & project

team perspective) whilst maintaining/increasing the benefits of standardisation

• Want to minimise the effort associated with transforming data to standards, or using more than one standard

• Need a less complex clinical computing environment/toolset• Need to be able to do more work with fewer resources• Currently replacing most of our clinical trial toolset … if we are going to

change our standards, we have to change them now


GSK Long term visionRegulatory/legal/public mandate:• GSK is well prepared to provide regulators and others with the data they require, in the

format required • always able to respond to regulatory queries quickly

Operational efficiency:• increase operational efficiency through the implementation of a metadata driven

approach• provide study teams with the flexibility to capture and process the data in an optimal

way (study teams to have the ability to decide on structure and grouping of their data)• variables much more clearly defined: less ambiguity, less confusionData Reuse:• ability to combine and analyse data across studies, indications and broader with little effortTraceability:• ability to trace all the way back from a result in a clinical report (e.g. a mean value or a p-

value) to the value that was first entered in the CRF/eCRF … with an understanding, at each step, of what data/variables were used and what algorithms were applied

OUR PLANS / ONGOING WORK AND THE SIMILARITY TO CDISC SHARE


So what are we doing?

• Long term, we want to use SHARE content• Cannot wait for SHARE before changing our standards as

we’re replacing systems now• Developed an Information Model which all our standards

will follow, together with an implementation plan for this– standards being developed independently of our systems– new systems built to work with / take advantage of the new

standards• Critically, our information model is based on the same

industry standards as the SHARE information model


So what are we doing?

• Metadata driven approach to developing, executing and reporting clinical trials– eProtocol tool– metadata repository– many systems consuming the metadata: eCRF tool, reporting

tools …• Metadata Repository– structured based on our information model– houses all the clinical data definitions– houses operational metadata (information needed to create

eCRFs, datasets, SDTM datasets etc)


Information Model Technicalities


Information Model Details

• The information model is a combination of three industry standards:– the BRIDG model (a collaborative piece of work

between CDISC, HL7, FDA and the US National Cancer Institute (NCI)

– the ISO21090 datatype standard (applicable across Healthcare, not just regulated clinical research) … very similar to the HL7 abstract datatypes

– the ISO11179 metadata registry standard


Simple explanation of these 3 standardsBRIDG is a standard way of representing the world of clinical research– it doesn’t take us right down to variables, but it does take us down to

meaningful objects such as “anatomic location”, “result”, “date” etc

ISO21090 datatypes are a standard way of representing particular types of data– these take us from the BRIDG meaningful objects such as “result” to

individual variables like “value”, “unit”, “code”

The link between BRIDG and ISO21090 is that all the BRIDG meaningful objects have an ISO21090 datatype

ISO11179 is a standard way of recording metadata in a metadata registry– we want to be compliant with it, but it isn’t something that operational folk

need to understand or worry about


Sources of Information

• BRIDG site: http://www.bridgmodel.org/ (we are using 3.0.3)

• ISO21090 standard: http://gforge.hl7.org/svn/hl7v3/trunk/dt/iso/index.htm (logon with username= anonymous and blank password) … the 2011 published version is on the ISO website

• Enterprise Architect is the modelling software used by BRIDG. Here is a link to a free viewer: http://www.sparxsystems.com/bin/EALite.exe

• I have included a simple to understand slide set on BRIDG and ISO21090 (15 easy slides) at the bottom of this slide pack for those who want to understand more

http://www.bridgmodel.org/

http://www.sparxsystems.com/bin/EALite.exe


Making the information model real …


What does this Information Model approach give us?

• A well developed modelling of clinical research … there shouldn’t be anything missing– so we model clinical data in a consistent and

formalised manner• A templated approach to the development of

our standards– we end up selecting variables from a short list

rather than manually creating them


And the usual inevitable downsides?

• BRIDG model is complicated– but this is because clinical research and clinical data are complicated– use of a templated approach to implementation removes much of the

complexity– you do need to train people (as always)– you need to take advantage of the capabilities to reap the biggest benefits

• ISO21090 datatype standard has been accused of being too complicated– without tools to help you, I’m sure that is true– but it is the complexity that allows the development of a templated

approach to standards creation– you need to train people … but mainly with regards to choices they have to

make


So what does content look like?

Blood Specimen Collection

Blood Specimen

Haemoglobin Test

Haemoglobin Result

is a result of

BRIDG based associations between concepts (wording in blue describes things from the bottom up)

is a result of

is a test performed on

Fasting status indicator value = trueDate Range low value = 23-Apr-2010

Accession Number Text value = 01876288485Condition Code item code = CC51 display name value = haemolysed

Category Code code = HAEM display name value = Haematology

Result value = 151 unit = g/L

Concepts: BRIDG based modelling of the clinical data


So what does content look like?

Blood Specimen Collection

Blood Specimen

Haemoglobin Test

Haemoglobin Result

Is a result of

Is a result of

Is a test performed on

Fasting status indicator value = trueDate Range low value = 23-Apr-2010

Accession Number Text value = 01876288485Condition Code item code = CC51 display name value = haemolysed

Category Code code = HAEM display name value = Haematology

Result value = 151 unit = g/L

Concept attributes from BRIDG

ISO21090 decomposition: “pre-variable attributes”

ISO21090 decomposition: variables (shown with example values)


What we get from the metadata• Concepts – clear definitions of clinical information (e.g.

height, systolic blood pressure, weight result)• Associations – how the concepts connect together, rules for

the use of concepts• BRIDG attributes – meaningful attributes for a piece of

clinical data (e.g. method, date, anatomic site, result) … some may have codelists

• ISO21090 decomposition: “pre-variable attributes” – various levels of clumping of variables; some may have codelists

• Variables – clear, model based, unambiguous variables


Steps needed to create that information?

• Choose which clinical scenario template we need (in this case, one containing specimen, lab test & lab result)

• Enter information about each concept (a name, a description, a definition …)

• Choose which of the BRIDG attributes we will need• Choose which associations are needed• Choose which bits of the ISO21090 decomposition we need• Enter the name of codelists when prompted (and select the

set of codes in that codelist that you want to make available for this concept)

SHARE CONTENT VERSUS GSK CONTENT


What we do expect from SHARE

• We expect SHARE to provide us with these model based definitions (the concepts, concept attributes and decomposition together with the associations between concepts and the terminology)

• We expect SHARE to provide us with the information needed to represent these definitions in the form of SDTM domains

• There will be a SHARE metadata repository• GSK expect to import all the SHARE metadata into the

GSK metadata repository


What we don’t expect from SHARE

• We do not expect SHARE to provide us with all the rules that GSK will want to apply

• We do not expect SHARE to provide us with all the operational metadata we need to create study objects (GSK datasets, GSK eCRFs)

• GSK expect to add additional metadata to the GSK repository … we want to augment the SHARE content, not change it


Choices• Just use the SHARE variables and forget about the rest of

the metadata– you get consistent industry standard variables– you can keep your own processes– but you may not use the variables in such a way that you can

aggregate your data with that of others– you miss out on the additional benefits

• Use the SHARE metadata to the full and augment with additional company metadata [the GSK approach]– you get all the benefits of using the SHARE metadata– you get additional capability to automate downstream processes


Creating a GSK standard using SHARE content

• Rules …– define which variables are mandatory, optional,

conditional in a study specification– define the conditionality rules e.g. either have to include

variables for total daily dose/dose units or dose/dose unit and frequency

– define which variables have to be populated if used in a study

– (in fact, we may apply rules to associations, BRIDG based attributes, “pre-variable attributes” and codes as well as to variables)


Rules example: Subject Disposition

Tick this …

… and you MAY tick none, one or many of these

Tick this …

… and you MUST enter text here

If the study includes pre-specified subreasons, an “other specify” subreason MUST be included and, if ticked, MUST be populated

If the study does not include subreasons, the “specify” MUST be included and populated

We should not expect SHARE to deliver these company specific rules


What extra metadata would we add?

• Mappings from other standards to concepts & concept variables– legacy data– development partners

• Mappings from SHARE terminology to GSK terminology and vice versa (mapping codes)– we want to use SHARE terminology as much as we

can but there are always going to be cases where, for some reason, we need to deviate


Central role of concept metadata

eCRFRender as an eCRF

mapping to concepts

SDTM

Concept Definitions

Render as SDTM

mapping to concepts

Non-GSK metadata

GSK legacymetadata

RegistryRender in registry form


Central role of concept aligned data

Represent as SDTM dataset

map data using metadata

Concept aligned data

Represent as registry format dataset

map data using metadata

Non-GSK data

GSK legacy data

SDTM

Registry

Aggregations“Aggregate anything”


So what operational metadata would we add?

• Metadata needed to render the definitions in a particular form e.g. an eCRF, a GSK dataset– length and precision for variables– whether a coded field should be represented as a

drop down box or a radio button– and more

• A study specification


Setting Up Studies

• For each study, we will produce a fully detailed study specification

• We will be doing this using the BRIDG modelling– key to taking full advantage of the concept metadata

• This will be done at a fully detailed level– including which variables will be collected at which

visits/timepoints– including which set of codes are available for use at that

visit/timepoint (when codelisted)– all the inherent structure of the metadata will be utilised to

the full


Setting Up Studies

• Benefits of utilising the BRIDG trial design modelling– the study time and events are modelled using study

design concepts1 and data collection concepts which makes for a fully integrated approach

– BRIDG modelling provides metadata/data driven navigation capability, guiding study investigators through sometimes very complex study procedures

• We can use the richness of the metadata included in the study specification to help with the creation of operational objects1 Study design concepts include visits, timepoints, cycles, arms, epochs, treatment strategies & elements


SDTM

• We expect to get totally consistent SDTM “for free”– concepts are associated with SDTM domains– concept variables are generated from BRIDG and

ISO21090– we expect there to be a mapping from BRIDG

attributes/ISO decomposition to SDTM variables– We expect to standardise/eliminate the inherent

SDTM wiggle-room through this process

WHAT DO YOU HAVE TO DO IN ORDER TO GAIN MAXIMAL BENEFITS FROM SHARE?


Important Actions

• Always maintain a link back from operational objects to the SHARE definitions

• Use the SHARE objects right from the design stage of a study

• Augment the SHARE metadata with company specific metadata, for example– rules (e.g. use this object or that object but not both)– additional metadata to permit automation of eCRF

screens (somewhat tool dependent)

FLEXIBILITY IN PRACTICE


System independent standards which are not tied to specific objects (e.g.

dataset)

This is the GSK standard for a dataset …

Any variation requires an exemption or a new standard

In the new standards each coloured block is a “standard” or “building block” and they can be combined in different ways to make objects (e.g. datasets).


Flexibility In Practice – Dataset Content

An AE eCRF screen may look like this …

With the new standards it can also look like this …

There will still be standard objects (e.g. datasets) to provide the benefits of standardisation but also more flexibility (fewer exemptions required)


Flexibility In Practice – DatasetsAn existing GSK dataset may look like this …

With the new standards the same data can also look like this …

Or this …

Or this …

Or this …

It all comes from the same building blocks (no exemptions required)


Flexibility In Practice – Transforming Non-Standard Data

CDISC SDTM datasets

CDISC ADaM datasets

GSK Operational datasets

GSK standard

Partner standard

Vendor X

In-licensed Compound

New regulatory

requirement

CDISC SDTM datasets

CDISC ADaM datasets

GSK Operational

datasets

GSK standard

Partner standard

Vendor X

Without building blocks …

… 9 mappings required

With building blocks …

… 6 mappings required

… 1 new mapping … 1 new mapping

CREATING AN E-CRF


Creating a smart eCRF• SHARE will provide metadata about clinical information• SHARE will provide multiple levels of clumping of objects e.g.

– value and unit– test and test result– albumin test is done using serum specimen

• Your company will add additional metadata to create company-specific standard combinations of the SHARE content e.g.– either total daily dose object will be used or single dose object + dose frequency object will

be used (but not both)• Your company will add additional metadata to indicate whether repeat values are

allowed– only one primary reason for discontinuation is allowed (and must be provided) but multiple

sub-reasons are permitted (and it is OK not to choose any)• Your company will add additional metadata and/or define rules to facilitate the

automation of eCRF creation e.g.– represent this codelist as a radio button if it has less than 6 possible values and as a drop-

down if it has 6 or more possible values• Some metadata will need to be created at a study level e.g.

– is this a collected field or a hard coded field


Creating a smart eCRF• Two component parts– creating individual pages … need metadata to:

• differentiate between hard coded information and collected information [study level metadata]

• drive pop-ups (e.g. pregnancy test details if subject is female) [company and/or study level metadata]

• allow repeat fields (e.g. medical history) [study may deviate from company level rule]

• rules (get investigator to confirm values that are outside certain limits) [company and/or study level metadata]

– navigation though the complete eCRF• general flow• exceptional flows e.g. if a particular event occurs, additional tests/visits

necessitated [BRIDG contains functionality to record this as computable metadata]


Will not cover the following slides during the training

They are for people to view after the meeting


Two industry standards: BRIDG and ISO21090

A simple explanation of what these are and what they provide


Information Model?

• An information model is a combination of structure and nomenclature– modelling the structure of data– employing a set of terms to describe the objects

• A good information model will ensure that nothing is glossed over and that similar things will be described in a similar manner


GSK’s rationale for using BRIDG and ISO21090• We developed GSK standards with no underlying information model

– these have the right content (the info we need in GSK’s clinical trials)– but consistency of approach, avoidance of duplication and ambiguity is not as good

as we would like• In 2009 we started to develop an information model based approach to

representing GSK’s clinical trial standards, in order to gain bigger benefits from standardisation– our original intention was not to implement BRIDG, but rather to use it as a tool … to

guide us– we ran into various issues requiring solutions … some of these we addressed using

our own solutions– at year end, we came to recognise that within BRIDG lies all the functionality we

need to provide solutions to all our issues– in January 2010, we took the decision to implement BRIDG and an ISO datatype

standard as we felt this is the optimal approach• using these we can address all our issues• and, we can develop a solution that will be at least similar to that of SHARE• and we will be using standards employed in the healthcare world


CDISC SHARE Project

• In the early days of the SHARE project, it was agreed that SHARE would use the BRIDG model, the ISO21090 datatype standard and the ISO11179 metadata registry standard as its information model

• Although SHARE could decide to implement these differently from GSK, currently the GSK and SHARE information models are very similar


BRIDG

• An information model• Targeted at protocol driven research• Reasonably mature• Key collaborators: CDISC, HL7, NCI, FDA


Key Features

• BRIDG is a model of protocol driven research– entities (animal, person, organisation, material)– activities (any action that can, in the context of a

study, be planned, scheduled or performed e.g. a surgical procedure, a laboratory test, or the administration of a drug)

– participation or functional role of an entity in an activity

– relationships between activities (both simple and complex relationships


Example: Tissue SamplesTissue

Specimen Collection

Freeze Specimen

Preserve Specimen

Embed Specimen in

ParaffinFrozen Tissue

SpecimenFresh Tissue Specimen Preserved

Tissue Specimen

Paraffin Block

Specimen

Cut Slide from Block Slide

Test

Result

Stain Slide

Stained Slide

Here we have a diagrammatic representation of a clinical procedure, in which a specimen is collected from a subject, some processing of the specimen may occur, and then the specimen is tested and a result obtainedWe may need data about some or all of the steps in this process, as well as about the test and its result


TissueSpecimen Collection

Freeze Specimen

Preserve Specimen

Embed Specimen in



Tissue Specimen

Paraffin Block

Specimen


Test

Result

Stain Slide

Stained Slide

Performed Specimen Collection

Performed Specimen Procedure

Biologic Specimen

Performed Observation

Performed Observation Result

Key to BRIDG classes

Same example, but indicating BRIDG classes



Freeze Specimen

Preserve Specimen

Embed Specimen in



Tissue Specimen

Paraffin Block

Specimen


Test

Result

Stain Slide

Stained Slide

• BRIDG has templates (classes) for all the different objects and activities that are needed to describe protocol driven research. We do not need to create these from scratch each time.

• In this diagram, we have shown which template is appropriate for each object or activity

• In effect, we have taken copies of BRIDG templates and made these specific to our clinical process

How does BRIDG help us?



Freeze Specimen

Preserve Specimen

Embed Specimen in



Tissue Specimen

Paraffin Block

Specimen


Test

Result

Stain Slide

Stained Slide

• One of the things that BRIDG gives us is a framework (the classes and the relationships between these classes) by which we can document the information generated through a clinical process. For example:• a specimen collection results in specimen(s)• a test can have more than one result, but a result can

only have one test

How does BRIDG help us?


That isn’t enough … what else?

Tissue Specimen Collection: Collection Method Site from which the specimen was taken Date on which the specimen was taken …..

• Each BRIDG class has its own defined set of attributes (placeholders) … these give us is a way of documenting the detail

• For example, the Specimen Collection class includes attributes for the collection method, the site from which the specimen was taken, the date on which the specimen was taken … plus another 14 attributes

• We choose the attributes we want to use for a given piece of information• Sometimes we associate these attributes with a specific codelist (or even

give the attribute a value)• Intention is that there are attributes for ALL the information we might want

to record

Here is one of those templates/classes:


Tissue Specimen Collection: Collection Method Site from which the specimen was taken Date on which the specimen was taken …..

• Key thing here is that every time you copy a particular template, and make it specific to your situation, you choose from the same set of attributes

• In some cases, you may wish to associate a specific codelist with all your uses of a specific attribute e.g. the anatomic location attribute

• This “copy, choose and make specific” process makes it easier for both computers and humans, as this enforces a consistent approach


What does that give us?


Tissue Specimen Collection: Collection Method CD Site from which the specimen was taken CD Date on which the specimen was taken IVL<TS.DATETIME> …..

• We need to know what sort of information we have: Is Collection Method text? Or is it coded? Is it in English?

• We need datatypes to answer these questions• BRIDG uses an ISO standard (ISO21090) for the datatypes• These datatypes are complex – not like SAS datatypes (character, numeric,

date etc)• Examples are shown in black in the diagram: CD and IVL<TS.DATETIME>• Each datatype has a number of attributes … it is through these attributes

that we get down to the variable level


But this is still not enough!


Selected attributes of the CD datatype:nullFlavor : NullFlavor, <used if original text cannot be coded>code : characterstring, CODEcodeSystem : characterstring, codeSystemName : characterstring, codeSystemVersion : characterstring, valueSet : characterstring, CODELISTvalueSetVersion : characterstring, displayName : ST, DECODEoriginalText : ED <original text>

Here is one of these datatypes:

So what do these datatypes do for us?

The CD datatype is for coded information (though CD stands for “Concept Descriptor”)You can see that it has all the attributes you need for coded dataYou can also see that some of the datatype attributes are themselves datatypes (e.g. originalText is of datatype ED). So attributes of a datatype can have attributes too … we have to “decompose” all these levels to get down to what we know as variables.


Selected attributes of the PQ datatype:originalText : ED.TEXT, uncertainty : QTY, uncertaintyType : UncertaintyType, uncertainRange : IVL(QTY) value : Decimal, VALUEcodingRationale : CodingRationale, unit : characterstring, UNIT

And here is another of these datatypes:

The PQ datatype is for physical quantities (things like sodium concentration, systolic blood pressure, number of lesions)You can see that it has the attributes you need

So what do these datatypes do for us?


Why complex datatypes rather than nice simple ones?

• Because we get extra benefit!• The datatype we use for a physical quantity (e.g.

the sodium concentration in the blood) has several component parts including Value and Unit

• When we use a sodium concentration result e.g. in a SAS dataset, we always keep Value and Unit together as the value is meaningless without a unit

• Use of these ISO datatypes gives us the facility to keep these sort of things together

share

Documents

current issuesgsk standards

different data structures

core standards aes

clinical data

reuse data

transforming data

data aggregationstandards

esug tc