clinical genomics joint with rcrim amnon shabo joyce hernandez mukesh sharma
TRANSCRIPT
Clinical Genomics Joint with Clinical Genomics Joint with RCRIMRCRIM
Amnon ShaboAmnon Shabo
Joyce HernandezJoyce Hernandez
Mukesh SharmaMukesh Sharma
AGENDAAGENDAAGENDAAGENDA
• Gene Expression CMET Overview• Genetic Reports CDA Ballot Overview• Gene Expression DAM Update • Generic Assay Overview• Specimen Model
Gene Expression CMET OverviewGene Expression CMET OverviewGene Expression CMET OverviewGene Expression CMET Overview
Gene Expression DAM UpdateGene Expression DAM UpdateGene Expression DAM UpdateGene Expression DAM Update
• Currently reviewing results of the last ballot• Next steps:
– Finish NCI Generic Assay (IRWG)
– Changes to GE DAM• Add “generic” classes from Generic Assay
• Bring over additional BRIDGE Classes• Apply suggested changes from the ballot (use case, BRIDG
compatibility)
Clinical Genomics DAM Clinical Genomics DAM (50,000 foot level view) (50,000 foot level view)Clinical Genomics DAM Clinical Genomics DAM (50,000 foot level view) (50,000 foot level view)
class Complete Diagram
A-Phenotype
AminoAcid
+ name: String
ArrayDataType
+ name: String+ version: String
HL7 CG Elements
Joyce Addi tions
NCI Model Elements
BRIDG 2.1
Modi fied for CG
MIAME-MAGE
MAGE-T AB
Legend
Expression
- classCode: String- id: Integer- code: String- negationInd: Boolean- text: String- effectiveT ime: String- uncertaintycode: String- value: String- interpretationCode: String- methodCode: String
T his is the Domain Information Model for the HL7 Cl inical Genomics Work Group.
It consists of the fol lowing topics:
1. Gene Expression 2. Genetic Variation3. Genotype4. Sequence5. Proteomics6. Links to Cl inical Phenotypes
Entry point for the Gene Expression CMET POCG_RM000031UV
AssociatedProperty
- classCode: String- id: Integer- code: String- text: String- effectiveT ime: String- value: String- methodCode: String
Gene
+ symbol: String+ ful lName: StringColumn+ genbankAccession: String+ genbankAccessionVersion: String+ ensemblgeneID: String+ unigeneclusterID: String+ entrezgeneID: String
Chromosome
+ chormosomeNumber: Integer
DNA
- name
Need defini tions
Use this class for inherent data about the locus, e.g. chromosone no.
RNA
- name
Nucleotides
+ nucleotideName: StringIntron
+ length: Integer+ intronClass: String
Exon
+ length: Integer+ intronClass: String
Nucleobases
+ shortName: String
Phosphate
+ name: String
Ester
+ name: String
Sugar
+ name: String
Codon
+ codonId: Integer
Usha: Relationship should be from Gene to DNA. Portions of DNA correspond to a Gene. Chromosones would have a bunch of genes.
GeneticLocus
- id: Integer- text: String- methodCode: String- chromosomePosition: Integer- cel lT ype: String
T his class is a placeholder for speci fying a locus on the genome, i .e., a posi tion of a particular given sequence in the subject’s genome. Note that the semantics of the locus (e.g., gene) is defined by data assigned in the code & value attributes of this class, and also by placing additional data relating to this locus into the classes (and CMET s) associated with this class.
Genome
- classCode: String- id: Integer- code: String- text: String- effectiveT ime: String- confidential i tyCode: String- value: String- interpretationCode: String- methodCode: String
GeneticLoci
- classCode: String- id: Integer- code: String- negationInd: Boolean- ti ti le: String- text: String- statusCode: String- effectiveT ime: String- confidential i tyCode: String- reasonCode: String- value: String- interpretationCode: String- methodCode: String
LargeDuplicaiton
- classCode: String- id: Integer- code: String- negationInd: Boolean- text: String- effectiveT ime: String- uncertaintycode: String- confidential i tycode: String- value: String- interpretationCode: String- methodCode: String
GeneticDocument
- classCode: String- id: Integer- code: String- ti tle: String- text: String- statusCode: String- effectiveT ime: String- confidential i tyCode: String- languageCode: String- setId: Integer- versionNumber: Integer
LargeDeletion
- classCode: String- id: Integer- code: String- text: String- effectiveT ime: String- confidential i tycode: String- uncertaintycode: String- value: String- interpretationCode: String- methodCode: String
Sequence
- classCode: String- id: Integer- code: String- negationInd: Boolean- ti ti le: String- text: String- statusCode: String- effectiveT ime: String- reasonCode: String- value: String- interpretationCode: String- methodCode: String
Need defini tions.
Need defini tions.
Need defini tions.
Use the value attribute to encapsulate raw data relating to the enti re set of loci . For example, SNP genotyping of a large number of genes/markers.
Cytogenetics
- classCode: String- id: Integer- code: String- negationInd: Boolean- text: String- effectiveT ime: String- confidential i tycode: String- uncertaintycode: String- value: String- interpretationCode: String- methodCode: String
???OtherNonLocusData
- classCode: String- id: Integer- code: String- negationInd: Boolean- text: String- effectiveT ime: String- confidential i tycode: String- uncertaintycode: String- value: String- interpretationCode: String- methodCode: String
Need defini tions.
Need defini tions. Need defini tions.
Need defini tions.
GenotypeFinding
- normal izedXIntensi ty: float- normal izedYIntensi ty: float- rawXIntensi ty: float- rawYIntensi ty: float- cal l : String
Indiv idualAllele
- classCode: String- id: Integer- code: String- negationInd: Boolean- ti ti le: String- text: String- statusCode: String- effectiveT ime: String- reasonCode: String- value: String- interpretationCode: String- methodCode: String
SequenceVariation
- classCode: String- id: Integer- code: String- negationInd: Boolean- ti ti le: String- text: String- statusCode: String- effectiveT ime: String- reasonCode: String- value: String- interpretationCode: String- methodCode: String
DeterminantPepetide
- classCode: String- id: Integer- text: String- effectiveT ime: String- value: String- methodCode: String
AssociatedObserv ation
- id: Integer- name: String- copyNumber: Integer- zygosi ty: String- dominancy: String- geneFamily: String
Polypeptide
- classCode: String- id: Integer- text: String- effectiveT ime: String- value: String- methodCode: String
Need defini tions. Need defini tions.
Should we leave this out and just add classes as needed?
Entry path to the broadest path of the genetic variation model.
ViralGenetics
- classCode: String- id: Integer- code: String- negationInd: Boolean- text: String- effectiveT ime: String- confidential i tycode: String- uncertaintycode: String- value: String- interpretationCode: String- methodCode: String
Added as a placeholder. > Future Expansion <
Need defintions
SNPAssay
- designAl leles: String- designScore: Float- designSequence: String- designStrand: String- id: Long- status: String- vendorAssayId: String- version: String
SNPPanel
- assayCount: Integer- description: String- id: Long- name: String- technology: String- vendor: String- vendorPanelId: String- version: String
SNP Design classes
Material
+ id: Integer+ description: String+ name: String+ formcode: String
ExtractedNon-GeneticSample
- extractedSsampleId: Integer- extractedAmount: Integer- extractedAmountUOM: String- extractionMethod: String::Material+ id: Integer+ description: String+ name: String+ formcode: String
ExtractedGeneticSample
+ geneticSampleId: Integer+ geneticSampleT ype: String+ extractedAmount: Integer+ extractionMethod: String+ GeneticSampleT ype: int+ hybridization: int+ authorizationLink: url::Material+ id: Integer+ description: String+ name: String+ formcode: String
OriginalBioSpecimen
+ amount: int+ uni tofMeasure: String+ statusCode: String+ statusDateRange: String::Material+ id: Integer+ description: String+ name: String+ formcode: String::Node+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
HandlingDocument
- Id: int- text: String
SpecimenCharacteristics
+ Id: int+ color: String+ clari ty: String+ condition: String
Collection
- col lectionMethod: int- id: int::Handl ingDocument- Id: int- text: String
Storage
+ id: Integer+ flashFrozenMethod: String+ temp: Integer+ storageMethod: String::Handl ingDocument- Id: int- text: String
Transportation
- id: Integer::Handl ingDocument- Id: int- text: String
Assume this is a generic l ist of al l material . Speci fic material used and tracked within the conduct of a study and/or cl inical care would be uniquely identi fied via other classes (i .e. extracted or resecti ioned samples). T he identi fier is used only for the original biological specimen.
ArrayGroup
- arraySpacingX: float- arraySpacingY: float- barcode: String- length: float- numArrays: Integer- orientationMark: enum(top,bottom,left,right)- width: float
Array
+ arrayIdenti fier: String+ arrayXOrigin: Integer+ arrayYOrigin: Integer+ originRelativeT o: String
ArrayDesign
+ id: Integer+ version: String+ comment: String+ substrateT ype: String+ surfaceT ype: String+ sequecnePolymerT ype: String+ contactId: Integer::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
ArrayManufacture
- manufacurungDate: String- tolerance: Integer
Gene expression Design classes
Do we need separate classes for Array Design (GE versus Genetic variation)?T he attributes I have added are from the new MAGE-T AB model .Do we sti l l need number of features (this came from the old version of the model)?
LabeledExtract
- flourescentLabel ingSubstance: String- flourescentLabel ingSubstanceAmount: float- flourescentLabel ingSubstanceUnits: float
Hybridization
- name: String- amountOfMaterial : float
ArrayManufactureDev iation
T his area of the MAGE model seems to be placeholders. T here are relationships to both FeatureDefect and ZoneDefect both of which do not have attributes.
FeatureDefect: Stores the defect information for a feature.T his class points to Posi tionDelta which has coordinate information (del ta X,Y). Posi tionDeltapoints to DistantUnit which contains additional measurement data.FeatureDefect points to an OntologyEntry which contains control led vocabulary. T he l ink constrains the vocabulary entries to represent only "defectT ype".
ZoneDefect: Stores the defect information for a zone.T his class points to the Zone class which does have lower-Right X,Y and upper-Left X,Y coordindates,plus a row identi fier.
Channel
- channel_no: Integer
BioAssayTreatment
+ bioAssayProcess: String::Node+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
ImageAcquistion
+ imageAcquistionMethod
Image
+ name: String+ url : String::DataFi le+ id: String+ name: String+ dataFi leT ype: String+ dataFormat: String::Data+ uri : String+ datatype: String
Should we embed the image as blob, rather than point to i t? Or provide both options?ANS: T oo huge to store in the database. Itis rare to go back to them. But some folks want to keep the images. T hey could be kept in secondary.
Deriv edBioAssayData
Need Array Manufacturing control data. Not chip but chip by overal l .
NOT E: Image is scanned at di fferent wave lengths.
FactorValue
- value: String::Measurement+ value: String+ m invalue: String+ maxvalue: String+ uni t: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Lab_Experiment
+ id: Integer+ ti tle: String+ description: String+ date: date+ assayT ype: String+ experimentalDesigns: String+ formatVersion: String+ publ icIdenti fier: String+ sdrfFi le: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Factor
+ type: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
ExperimentalDesigns
+ type: String+ description: String
QualityControl
+ type: String+ qual i tyControlDescription: String
Replicatetypes
+ repl icateT ype: String+ repl icateDescription: String
NormalizationTypes
+ normal izationT ype+ normal izationDescription: String
T hese classes needs to be harmonized to Study Design portion ofthe BRIDG model.
GenomicProtocol
+ id: Integer+ name: String+ type: String+ description: String+ hardware: String+ software: String+ contact: String+ url : String+ publ icProtocolUrl : String
Assume there can be multiple "experiments" for complex studies. ???
Also the new version cal ls this an Investigation. When talking about genomics testing a lot of SMEs use the term "Experiment". Investigation can also be connected more easi ly to the term study which already has a broader scope since i t represents the "cl inical trai l " used in the research context.
Another factor is that the MGED ontology makes references to "ExperimentalProtocol" in a number of places, so i t m ight be better to keep a known term.
Which terms does the team prefer? Is there a term that could fi t both research and healthcare use?
T his class wi l l need speci fic harmonization to the Study class in BRIDG.hardware/software requirements for the arrays need to added. T hese should probably be normal ized into separate classes.
For CG DAM model reviewers:we need more examples: Is there other software required other than the Reporter?
ProtocolApplications
+ edgeId: Integer+ order: Integer+ notes: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Is this the proper way toidenti fy an individual channel?
Germ l ine/Somatic needs to be val idated by a lab test. Should i t just be represented as part of a test and taken out of the bio-specimen?
ImageFile
+ name: String+ status: String+ type: String
Raw ArrayData
DataFile
+ id: String+ name: String+ dataFi leT ype: String+ dataFormat: String::Data+ uri : String+ datatype: String
Need defini tion on what type of data is carried here and which function in the process populates i t.
Val idate that "ordered" means sequenced and does not represent "ordered" from lab.
Feature
+ featureId: Integer+ blockCol : Integer+ blockRow: Integer+ col : Integer+ row: Integer+ reporterid: Integer::DesignElement+ id: Integer+ compositid: Integer+ arrayDesignId: Integer+ featureID: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Reporter
+ id: Integer+ controlT ype: String+ sequence: String+ databaseEntry: String::DesignElement+ id: Integer+ compositid: Integer+ arrayDesignId: Integer+ featureID: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
ReporterGroup
+ reporterGroupId: Integer+ name: String::DesignElement+ id: Integer+ compositid: Integer+ arrayDesignId: Integer+ featureID: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Need to di fferentiate between frozen and fix.For breast cancer.Containers need to be added.Example:Non-frozen and frozen tissue samples need to be included.Unfixed tissue sections (sl ide type and sl ide mount. In healthcare)
Add class to handle thechange of state of the material .
T ypical ly cal led protocol of treatment.
In MAGE this is the actual Image and everything that was done to get i t.
Can do another treatment and get another image. Actual steps are not kept for al l images. Usual ly only recorded for the last image.
JH: MAGE-T AB model confl icts with these statements. It has an Assay Class as part of the sdrf package and an Image class as part of the data package. I wi l l rename this Bio-Assay class to just Assay.
NOTE: Mollie: Wants to constrain model for clinical environment at a later point.
Hardw are
+ description: String+ version: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Softw are
+ description: String+ version: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Edge
+ id: Integer+ experimentIdenti fier: String+ input: String+ output: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Need examples for EDGE data, primari ly for input and output. Couldn't find any at themagetab and tabemage si tes.
Node
+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
DimensionElement
+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Performer
+ id: Integer+ personID: Integer+ protocol ID: Integer
Person
+ fi rstname: String+ lastname: String+ m idini tials: String+ affi l iation: String::Contact+ Address: String+ phone: String+ emai l : String+ fax: String+ tol lFreePhone: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
PersonRole
+ role: String::Person+ fi rstname: String+ lastname: String+ m idini tials: String+ affi l iation: String::Contact+ Address: String+ phone: String+ emai l : String+ fax: String+ tol lFreePhone: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Publication
+ id: Integer+ pubMedID: String+ ti tle: String+ publ icationDOI: String+ authorl ist: String+ status: String
Contact
+ Address: String+ phone: String+ email: String+ fax: String+ tollFreePhone: String::Identifiable+ id: URI+ name: String+ properties: String+ description: String
Assay
+ arrayIdenti fier: String+ arrayDataFi les: String+ derivedArrayDataFi les: String+ arrayDataMatricFi les: String+ derivedArrayDataMatrixFi les: String
TechnologyType
+ technologyT ype: String
CompositElement
+ id: Integer+ reporterID: Integer+ databaseEntry: String::DesignElement+ id: Integer+ compositid: Integer+ arrayDesignId: Integer+ featureID: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
DesignElement
+ id: Integer+ compositid: Integer+ arrayDesignId: Integer+ featureID: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Question on EBI example: http://www.ebi.ac.uk/m iamexpress/help/array_designs.htm l#ADF
Should the CompositeSequenceComment be represented as a databaseentry (Ontology T erm / Value pair) or as variable?
Data
+ uri : String+ datatype: String
DataElement
+ id: Integer+ datamatrixId: Integer+ col : Integer+ row: Integer+ rowQuanti tationT ype: String+ index_: Integer+ secondayKey: String
DataMatrix
Need more information on how this is implemented. Description seems to indicate calculation.See MGED section below:
class Quanti tationT ypedefini tion:T he Quanti tationT ype provides a method for calculating a single datum of the BioAssayData matrix.superclasses: Quanti tationT ypePackageproperties: unique_identi fier MO_67 class_role abstract class_source mageconstraints: restriction: has_scale has-class Scale restriction: has_type has-class DataT ype
Name: Complete DiagramAuthor: hernajoyVersion: 1.0Created: 2006-01-11 12:00:00 AMUpdated: 2010-06-17 7:10:15 PM
DATA MATRIX EXAMPLE from: http://tab2mage.sourceforge.net/docs/magetab_docs.html#datamatrix
Bio-Specimen-Characteristics
+ term: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
T his is equivalent to Material class in the MAGE-T AB model .
Material in this model appl ies to BRIDG and HL7 expanded scope which goes beyond biologic material .
Assume this class needs to represent the many to many associations between the fol lowing MGED concepts. T hese associations attempt to group mathematical functions into nodes.
1. Nodes2. Node Values3. Node Value T ypes4. BioAssays5. BioAssayDataCluster
Normalization
+ derviedArrayDataFi le: String::Node+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Scan
+ arrayDataFi les: String+ derivedArrayDataFi les: String+ arrayDataMatricFi les: String+ derivedArrayDataMatrixFi les: String::Node+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Measurement
+ value: String+ m invalue: String+ maxvalue: String+ uni t: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Need sample data for this class.
ProtocolParameter
::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
ParameterValue
+ protocolParameterId: Integer+ protocolAppl ication: Integer::Measurement+ value: String+ m invalue: String+ maxvalue: String+ uni t: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
Source
+ contactid: Integer::OriginalBioSpecimen+ amount: int+ uni tofMeasure: String+ statusCode: String+ statusDateRange: String::Material+ id: Integer+ description: String+ name: String+ formcode: String::Node+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
TreatedSample
::ExtractedGeneticSample+ geneticSampleId: Integer+ geneticSampleT ype: String+ extractedAmount: Integer+ extractionMethod: String+ GeneticSampleT ype: int+ hybridization: int+ authorizationLink: url::Material+ id: Integer+ description: String+ name: String+ formcode: String
NameValueType
+ id: Integer+ name: String+ type: String
Definition of Experiment
SPECIMEN HANDLING
ARRAY DESIGN
RELATIONSHIPS BETWEEN: (Samples, Arrays and Data)
GENE EXPRESSION DATA
Usha: May not need sugar and phosphate data.
Specimen Handling
+ type: String+ name: String+ amount: Integer::Handl ingDocument- Id: int- text: String
Shipper
- dateShipped: String- senderT ype: String- senderName: String::Transportation- id: Integer::Handl ingDocument- Id: int- text: String
Receiv er
- dateRecieved: String- receiverT ype: String- receiverrName: String::Transportation- id: Integer::Handl ingDocument- Id: int- text: String
SpecimenContainer
+ containerT ype: String+ risk: String+ handl ing: String+ capaci tyQuanti ty: Integer+ heightQuanti ty: Integer+ diameterQuanti ty: Integer+ capT ype: String+ separatorT ype: String+ barrierQuanti ty: Integer+ bottomDeltaQuanti ty: Integer::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
CellSource
+ T ype: String::OriginalBioSpecimen+ amount: int+ uni tofMeasure: String+ statusCode: String+ statusDateRange: String::ResultInterpretation+ id: Integer::Material+ id: Integer+ description: String+ name: String+ formcode: String::Node+ experimentId: Integer::DimensionElement+ node: String::Identi fiable+ id: URI+ name: String+ properties: String+ description: String
ResultInterpretation
+ id: Integer
1
0..*
0..*
0..*
1
0..*
Contain/ about
contains * /coded by 1
made upof * / ispart of 1
binds * /boundby 1
0..* 0..*
sourced from /derivedcol lection store
0..1
Sourcedfrom /produces
0..*
1..*
produces /produced by
0..*
0..* 0..*
0..1
representedby /represents
0..1
1contains 1.* /part of 1
binds * /boundby 1
* doneon an /0..1undergo
makes * /made by 1
1..
speci fiedby 1 /speci fies * *
contain1.* / partof 1
definedby 1 /defines *
1
0..* 1
0..*
1
0..*
1
0..*
may have 0.*/ defined by 1
contains1.* / partof 1
0...* created by 1/ 1 resul ts in 0..*
1..*
coded by1.* /codes 0..1
0..1
1 may have * /* can beassociated to 1
0..*
0..*
1
0..*
0..*
arrayDataMatrixFi lesLink
0..*
0..*
0..*
appears in */ 1represents asection of
0..*
shipsto 0..*
1..*
0..1
+usedfor
1..*
0..1
derivedArrayDataMatrixFi lesLink
contains * /part of 1
maycontain1.* /belongsto
0..*
arrayDataFi lesLink
0..*
0..*
derivedArrayDataFi lesLink
0..*
0..*
0..*
0..*
0..*
1
0..*
0..*
0..*
1
0..*
contains / isdescribedby
0..*
0..*
1
0..*
1..
0..*
0..*
0..*
0..*
0..*
0..*
0..*
printingProtocol
0..*
1..*
belongsto /contains
1..
0.* used in /0.*performed on
label l ing produces 1 /resul ts from label l ing 1
mayproduce /producedfrom
processingdecribed by /describesprocessing for
0..*1
GeneticVariation
Bio-Specimen
Gene Expression
Color Coding SchemeColor Coding SchemeColor Coding SchemeColor Coding Scheme
class Gene Expression
HL7 CG Elements
Joyce Additions
NCI Model Elements
BRIDG 2.1
Modified for CG
MIAME-MAGE
MAGE-TAB
Legend
CG DAM ViewsCG DAM ViewsCG DAM ViewsCG DAM Views
• Process Models– Specimen Handling and Collection (based on NCI public protocol)– Genomcis Testing Process (high level)– Future – interaction diagrams for message flows per Use Case
• Gene Expression – Whole Model– Bio-specimen– Experiment Definition (Gene express specific protocol, not entire study)– Array Design– Common Classes– Data– Relationships
Study Experiment Data
ProtocolEquipment
Software
ExperimentalItem
* *
*
**
*
*
-Study may include other Studies-Study may be composed of many Experiments-Experiment may include other Experiments-Experiment may involve multiple ExperimentalItems-Experiment may be based on multiple Protocols-Experiment may be performed using multiple Equipment-Experiment may be performed using multiple Software-Experiment may produce multiple Data (Output)
Generic Assay Overview
Experiment:-Affymetrix U133P2 Gene Expression-Affymetrix U133P2 Analysis-Specimen definition information entry (might be a component of Affymetrix U133P2 Gene Expression)-Total RNA extraction and QC (might be a component of Affymetrix U133P2 Gene Expression)-cDNA synthesis and cleanup-U133P2 array scan (GCOS: create *.dat and create (.dat to) *.cel)-GCOS U133P2 Gene Expression Analysis (might be a component of Affymetrix U133P2 Analysis)
Study Experiment Data
ProtocolEquipment
Software
ExperimentalItem
* *
*
**
*
*
Study:- Gene expression analysis of tumor/non-tumor sample pair
Examples of Data (ie., Output):-A_U133P2_cDNA-A_U133P2_cDNA_gel_tif-A_U133P2_cDNA_gel_doc -A_U133P2_SpecimenHybChipWashed (ready for stain and wash)-A_U133P2_Specimen_dat-A_U133P2_Specimen_cel-A_U133P2_Specimen_chp (data file with genotypes)
ExperimentalItem:-Project-specific specimen set
Equipment:-Thermacycler-gel apparatus-camera/image system-Affymetrix Fluidics WashStation 450-Affymetrix GS3000 scanner
Protocol:-Affymetrix Cytogenetics Assay Protocol-Affymetrix Protocol for One-Cycle cDNA Synthesis
Software:-image acquisition application-Agilent 2100 Operating Software -GCOS application
Generic Assay Overview
10
Study Experiment
Data
Protocol
Equipment
Software
ExperimentalItem
*
*
*
*
*
*-Study may include other Studies-Study may be composed of many Experiments-Study may be performed according to multiple Protocols-Experiment may include other Experiments-Experiment may involve multiple ExperimentalItems-Experiment may be performed according to multiple Protocols-Experiment may be performed using multiple Equipment-Experiment may be performed using multiple Software-Experiment may produce multiple Data (Output) -Experiment may be performed on Data (data an input for analytical experiment)-Protocol may include other Protocols-Protocol may specify Equipment-Protocol may specify Software-Equipment may specify Software
Study: A detailed examination or analysis designed to discover facts about a system under investigation. Systems may include intact organisms, biologic specimens, and natural or synthetic materials.
Experiment: A coordinated set of actions and observations designed to generate data, with the ultimate goal of discovery or hypothesis testing.
Protocol: A rule which guides how an activity should be performed.
Equipment: An object intended for use whether alone or in combination for diagnostic, prevention, monitoring, therapeutic, scientific, and/or experimental purposes. For example, ….mass spectrometer, PCR machine, microscope, pH meter
ExperimentalItem: Items used in the execution of an experiment: specimens - samples either taken from nature or created for the purpose of study and which are to be the subject of an experiment, and reagents and supplies which will be used in the execution of an experiment. It is not instruments, analysis tools, and general-purpose resources (common reagents, lab equipment, personnel).
Data: A collection or single item of factual information, derived from measurement or research, from which conclusions may be drawn. For example, an image, a .DAT, or .CEL file.
ProcessedData: Data derived from other data. For example, image annotations derived from an image, or the outcome of running a .CEL file through an analytical tool.
The notion of what is data (vs. processed data) is defined by community consensus and may be mutable. Some may consider the .DAT file to be data, and that the .CEL file is processed, while others may consider the .CEL file itself to be data (unprocessed).
= Proposed last week
Generic Assay Overview
Notes:1.ProcessedData has association to Finding; not included on the diagram to keep things focused
1. Isn’t the result of an analytical experiment what we’ve called ProcessedData?2. Do we need to have distinction between Data and ProcessedData? Can we have self association
on Data to handle both in the DAM2.Software needs to be defined3.What about association from ExperimentalItem to ExperimentalStudy?
Generic Assay Overview