prosanos corporation confidential and proprietary modeling and clustering disease progression for...
TRANSCRIPT
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Modeling and clustering Modeling and clustering disease progression for disease progression for
correlation with genetic and correlation with genetic and demographic factorsdemographic factors
Robert KinganRobert Kingan
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
What is SSIFT?
“To address […] common diseases, which include schizophrenia, depression, and breast cancer, it is essential to incorporate observations of the clinical progression of the disease to refine the definition of phenotype.” – Michael N. Liebman, U. Penn.
Yes, but what is SSIFT?– SSIFT = Stratification and Synchronization Inference
Technology
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
What is SSIFT?
Stratification: Dividing a patient population into groups which are meaningful for diagnosis, prognosis, treatment selection, or genotype-phenotype correlation.
Synchronization: Recognizing a pattern of disease progression, regardless of disease stage for a particular patient.
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT overview
Assumptions—what is SSIFT-ableOther constraints on data selectionOutline of technique
– Identifying variables– Modeling disease progression– Parameterizing different models– Clustering patients by progression patterns– Interpreting the results
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Pattern of disease progression
Time
Dis
ease
mar
ker
initial value
final value
period of change
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT workflow
Survey the data
Select useful variables
Fit disease progression models
Construct feature vectors
Assign feature weights
Cluster weighted feature vectors
Evaluate the clustering results
Complete?No
Yes
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT workflow
Patient 1
0
0.5
1
1.5
2
2.5
3
3.5
4
1 2 3 4 5 6 7 8 9 10
Patient 2
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Patient 3
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10
Patient 4
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Patient 6
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10
Patient 7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10
Patient 8
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7 8 9 10
Patient 9
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
SSIFTPatient 5
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10
Patient 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10
Group A: Patients 1,3,7,10
0
1
2
3
4
5
1 4 7
10 13 16 19 22 25 28 31 34 37 40 43 46 49
Time (years)
Mar
ker L
evel
Group B: Patients 2,4,5,6,8,9
0
1
2
3
4
5
6
1 4 7
10 13 16 19 22 25 28 31 34 37 40 43 46 49
Time (years)
Mar
ker L
evel
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT curve types
)(
)(
1)()(ˆ
t
t
e
eabaty
cty )(ˆ
mtyty 0)(ˆ
)1ln()(ˆ )(22
00
0
ttety
)1ln()(ˆ )(22
00
0
ttety
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Converting parameters
Logistic
Constant
Linear
Early stable
Late stable
)*)2/)(
,,,(),,,( 4)(
m
ybabamba ab
),,,(),,,( NULLNULLccmba
)*
,),(ˆ),(ˆ(),,,( 01 m
yymtytymba n
)*
,),(ˆ,(),,,( 0
y
ttymba n
)*
,,),(ˆ(),,,( 01
y
ttymba
y* = population mean, t1=first time point, tn=last time point
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Modified Mahalanobis distance
Tqpqp vvvvqpd )()(),( 1
Tqpqpqp vvvvqpd ))()((),( 11
21
Tqp
Tqpqp vvQggQvvqpd )())()(()(),( 2
1
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT workflow
Survey the data
Select useful variables
Fit disease progression models
Construct feature vectors
Assign feature weights
Cluster weighted feature vectors
Evaluate the clustering results
Complete?No
Yes
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT workflow
Survey the data
Select useful variables
Fit disease progression models
Construct feature vectors
Assign feature weights
Cluster weighted feature vectors
Evaluate the clustering results
Complete?No
Yes
Correlate results with:•demographic data•genetic data
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Application of SSIFT to NIDDK
About NIDDKSSIFT and transplant dataVariable selectionModelingResults
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Candidate variables
-Fetoprotein Albumin Alkaline phosphatase (AP) Bicarbonate Blood urea nitrogen (BUN) Calcium Creatinine clearance Cholesterol Chlorine Corrected PT control Creatinine Direct bilirubin FK506 level Glomerular filtration rate Gamma GTP Glucose Hematocrit (HCT)
Hemoglobin CSA HPLC level Potassium CSA monoclonal level Sodium Platelet count Prothrombin time Part. thromboplastin CT Part. thromboplastin PT CSA RIA level SGOT (AST) SGPT (ALT) Total bilirubin CSA TDX level Total protein White blood cells (WBC) Weight in KG
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Selected variables
Variable Log? Ŝu Weights Ŝw
a b m
AST Yes 0.19 0 1 1 0 0.32
AP Yes 0.18 0 1 1 0 0.28
Hemoglobin No 0.13 0 1 0 1 0.24
Total bilirubin Yes 0.15 0 1 0 1 0.21
Potassium No 0.20 1 1 1 1 0.20
Hematocrit No 0.19 1 1 1 1 0.19
WBC Yes 0.17 1 1 1 1 0.17
BUN Yes 0.14 1 1 1 1 0.14
Creatinine Yes 0.12 1 1 1 1 0.12
Sodium No 0.11 1 1 1 1 0.11
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Evaluating Kaplan-Meier curves
Ŝ
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Final selected variables
Best pair: AST + AP, Ŝ=0.34
Best triple: AST + AP + hematocrit, Ŝ=0.42
No set of four variables exceeded Ŝ=0.42
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Survival by clustered SSIFT AST, AP and HCT parameters
Ŝ = 0.42
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Cluster mean curves
Best clusterWorst cluster
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT in Gene Discovery: Simulation
TimeM
arke
rs
SSIFT™
Disease Genes Disease Progression Pattern
Determine
AnalyzeDiscover
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Simulated data
Mar
ker
Val
ue
(rel
ativ
e sc
ale)
Time (years)
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
14
77
5 49 4
26
87 8
53
76
91 74 9
64 51
18
31
38 48
23
89 73
70
80
39
61 40 2
97 3 29
12
32
56
83 77
10
01
46
06
67
1 30
33
58 21
96
15
57
99 6 79 25
86
94 27
81
37
65
72 46
69
88
98 95 13
52
93 19 35
34
43 5
01
74
4 42 41 8
52
43
6 28 78
55
92
10
11 16
59
90 84
45
57
68
20
67 22
54
82 63 6
2
01
23
45
Dendrogram of agnes(x = distance.matrix, diss = TRUE, method = "average")
Agglomerative Coefficient = 0.99
He
igh
t
Clustering Results
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
Nearest-Neighbor AnalysisGene Genotype for Nearest Neighbors, based on SSIFT Pattern
123456789012345678901234567890123456789012345678901234567890123456789C9254 352131423364133346331543231331365461513311265564413451642314514456463
D7562 323136554461643162336542432246213615526446315262641541251463645444165
J1789 122552224555422461565335314552346323516442534343355552444332245543456
A2109 451422556633426215561542335632532551544111436664632366416662551652621
J2602 261323412652652223665466252216452111435542263542444161536324633341322
W4147 321143532244464333634436443621115464641422635644662235654536525633252
C2353
L9800 336242634156534231126616432343525335453144614443526334516552645522411
P3336 134335463316364241553225312351666146252445354642364643361143152565441
R2489 333645353163166613452462363523625226142335415145144513456124654144534
K805 125622234521326136152541524635445125324132524235224536261424613321411
K8420 121246521534522166135433555544321611651615366165466361564321116551155
D7336 656426432634424564226153331645113553652653122613653166613412212454536
S4207 336631132522113543146663466524336526152322153355236454211112421554435
S9560 612133524211556321334441342343121422445113565223663146264256263231422
B3833 111451165325313114512436245536622531545565455124463645111525655366466
B1192 641336314611531121361246426232236435651132226313322353452353441515446
S939 235312511132633313662343516122256413432554515433213166261326465216115
T3285 441652143434363443164114445154263532135434413455513346363354553424142
555555566666666666645526566351142222222222224412114442333333333344344
C2353 is related to SSIFT pattern of disease progression (p<10-41 ).
ProSanos Corporation Confidential and ProprietaryProSanos Corporation Confidential and Proprietary
SSIFT: Stratification and SSIFT: Stratification and Synchronization Inference Synchronization Inference
TechnologyTechnology
DiscussionDiscussion