previous lecture : exploring data
DESCRIPTION
Previous Lecture : Exploring Data. This Lecture. Introduction to Biostatistics and Bioinformatics Descriptive Statistics. Process of Statistical Analysis. Population. Random Sample. Make Inferences. Describe. Sample Statistics. Distributions. Normal. Skewed. Long tails. Complex. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/1.jpg)
Previous Lecture: Exploring Data
![Page 2: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/2.jpg)
Introduction to Biostatistics and Bioinformatics
Descriptive Statistics
This Lecture
![Page 3: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/3.jpg)
Process of Statistical Analysis
Population
Random Sample
Sample Statistics
Describe
MakeInferences
![Page 4: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/4.jpg)
DistributionsComplex Normal Skewed Long tails
![Page 5: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/5.jpg)
Randomly Sample from any Distribution
1. Generate a pair of random numbers within the range.
2. Assign them to x and y3. Keep x if the point (x,y) is within the distribution.4. Repeat 1-3 until the desired sample size is
obtained.5. The values x obtained in this was will be
distributed according to the original distribution.
![Page 6: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/6.jpg)
Mean
n
ni
iix
1
xxx n,...,,21
Mean
Sample
![Page 7: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/7.jpg)
MeanComplex Normal Skewed Long tails
Sample Size
100
1
-1
0.2
-0.2
![Page 8: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/8.jpg)
Median, Quartiles and Percentiles
xxx n,...,,21
Sample
Quartiles
xQ i
1 for 25% of the sample
xQ i
2for 50% of the sample
(median)xQ i
3 for 75% of the sample
xP im for m% of the sample
Percentiles
Inter Quartile Range
QQIQR13
![Page 9: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/9.jpg)
Median and MeanComplex Normal Skewed Long tails
Sample Size
100
1
-1
0.2
-0.2
Median - Gray
![Page 10: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/10.jpg)
Quartiles and MeanComplex Normal Skewed Long tails
Sample Size
100
1
-1
0.2
-0.2
Q3 - Purple
Q1 – Gray
![Page 11: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/11.jpg)
Central Limit Theorem
The sum of a large number of values drawn from many distributions converge normal if:
• The values are drawn independently;• The values are from the one distribution; and • The distribution has to have a finite mean and
variance.
![Page 12: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/12.jpg)
Variance
n
ni
iix
1
xxx n,...,,21
Variance
Sample
Mean
n
i
ni
ix
1
2
2)(
![Page 13: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/13.jpg)
VarianceComplex Normal Skewed Long tails
Sample Size
100
0.6
0
0.1
0
![Page 14: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/14.jpg)
Inter Quartile Range and Standard Deviation
Complex Normal Skewed Long tails
Sample Size
100
1.0
0
0.4
0
IRQ/1.349 - Gray
![Page 15: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/15.jpg)
Uncertainty in Determining the MeanComplex Normal Skewed Long tails
n=3
n=10
Average
n=100
n=3
n=10
n=100
n=3
n=10
n=100
n=10
n=100
n=1000
![Page 16: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/16.jpg)
Standard Error of the Mean
n
ni
iix
1
xxx n,...,,21
Variance
Sample
Mean
n
i
ni
ix
1
2
2)(
nmes
..
Standard Error of the Mean
![Page 17: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/17.jpg)
Error bars
M. Krzywinski & N. Altman, Error Bars, Nature Methods 10 (2013) 921
In 2012, error bars appeared in Nature Methods in about two-thirds of the figure panels in which they could be expected (scatter and bar plots). The type of error bars was nearly evenly split between s.d. and s.e.m. bars (45% versus 49%, respectively). In 5% of cases the error bar type was not specified in the legend. Only one figure used bars based on the 95% CI.
None of the error bar types is intuitive. An alternative is to select a value of CI% for which the bars touch at a desired P value (e.g., 83% CI bars touch at P = 0.05).
![Page 18: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/18.jpg)
Box Plot
M. Krzywinski & N. Altman, Visualizing samples with box plots, Nature Methods 11 (2014) 119
![Page 19: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/19.jpg)
n=5
Box PlotsComplex Normal Skewed Long tails
n=10
n=100
n=5
n=10
n=100
n=5
n=10
n=100
n=5
n=10
n=100
![Page 20: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/20.jpg)
Box Plots with All the Data PointsComplex Normal Skewed Long tails
n=5
n=10
n=100
n=5
n=10
n=100
n=5
n=10
n=100
n=5
n=10
n=100
![Page 21: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/21.jpg)
Box Plots, Scatter Plots and Bar GraphsNormal Distribution
Error bars: standard deviation error bars: standard deviation
error bars: standard error error bars: standard error
![Page 22: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/22.jpg)
Box Plots, Scatter Plots and Bar GraphsSkewed Distribution
Error bars: standard deviation error bars: standard deviation
error bars: standard errorerror bars: standard error
![Page 23: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/23.jpg)
Box Plots, Scatter Plots and Bar GraphsDistribution with Fat
TailError bars: standard deviation error bars: standard deviation
error bars: standard errorerror bars: standard error
![Page 24: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/24.jpg)
Application: Analytical Measurements
Theoretical Concentration
Measu
red
C
on
cen
trati
on
![Page 25: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/25.jpg)
A Few Characteristics of Analytical Measurements
Accuracy: Closeness of agreement between a test result and an accepted reference value.
Precision: Closeness of agreement between independent test results.
Robustness: Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature).
Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control.
Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy.
Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.
![Page 26: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/26.jpg)
Measuring Blanks
![Page 27: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/27.jpg)
Coefficient of Variation
n
ni
iix
1
xxx n,...,,21
Variance
Sample
Mean
n
i
ni
ix
1
2
2)(
Coefficient of Variation (CV)
![Page 28: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/28.jpg)
Lower Limit of Detection
The lowest amount of analyte that is statistically distinguishable from background or a negative control.
Two methods to determine lower limit of detection:
1. Lowest concentration of the analyte where CV is less than for example 20%.
2. Determine level of blank by taking 95th percentile of the blank measurements and add a constant times the standard deviation of the lowest concentration.
K. Linnet and M. Kondratovich, Partly Nonparametric Approach for Determining the Limit of Detection, Clinical Chemistry 50 (2004) 732–740.
![Page 29: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/29.jpg)
Limit of Detection and Linearity
Theoretical Concentration
Theoretical Concentration
Measu
red
C
on
cen
trati
on
Measu
red
C
on
cen
trati
on
![Page 30: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/30.jpg)
Precision and Accuracy
Theoretical Concentration
Theoretical Concentration
Measu
red
C
on
cen
trati
on
Measu
red
C
on
cen
trati
on
![Page 31: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/31.jpg)
Descriptive Statistics - Summary
• Example distribution: • Normal distribution• Skewed distribution• Distribution with long tails• Complex distribution with several peaks
• Mean, median, quartiles, percentiles
• Variance, Standard deviation, Inter Quartile Range (IQR), error bars
• Box plots, bar graphs, and scatter plots
• Application: Analytical measurements:• Accuracy and precision• Limit of detection and quantitation• Linearity• Robustness
![Page 32: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/32.jpg)
Descriptive Statistics – Recommended Reading
http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html
![Page 33: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/33.jpg)
Descriptive Statistics – Recommended Reading
http://greenteapress.com/thinkstats/
![Page 34: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/34.jpg)
Next Lecture: Data types and representations
in Molecular Biology
>URO1 uro1.seq Length: 2018 November 9, 2000 11:50 Type: N Check: 3854 ..CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACAACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTTGCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACCCACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTGTGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCAGGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCATCTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGATGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG
@SRR350953.5 MENDEL_0047_FC62MN8AAXX:1:1:1646:938 length=152NTCTTTTTCTTTCCTCTTTTGCCAACTTCAGCTAAATAGGAGCTACACTGATTAGGCAGAAACTTGATTAACAGGGCTTAAGGTAACCTTGTTGTAGGCCGTTTTGTAGCACTCAAAGCAATTGGTACCTCAACTGCAAAAGTCCTTGGCCC+SRR350953.5 MENDEL_0047_FC62MN8AAXX:1:1:1646:938 length=152+50000222C@@@@@22::::8888898989::::::<<<:<<<<<<:<<<<::<<:::::<<<<<:<:<<<IIIIIGFEEGGGGGGGII@IGDGBGGGGGGDDIIGIIEGIGG>[email protected] MENDEL_0047_FC62MN8AAXX:1:1:1724:932 length=152NTGTGATAGGCTTTGTCCATTCTGGAAACTCAATATTACTTGCGAGTCCTCAAAGGTAATTTTTGCTATTGCCAATATTCCTCAGAGGAAAAAAGATACAATACTATGTTTTATCTAAATTAGCATTAGAAAAAAAATCTTTCATTAGGTGT+SRR350953.7 MENDEL_0047_FC62MN8AAXX:1:1:1724:932 length=152#.,')2/@@@@@@@@@@<:<<:778789979888889:::::99999<<::<:::::<<<<<@@@@@::::::IHIGIGGGGGGDGGDGGDDDIHIHIIIII8GGGGGIIHHIIIGIIGIBIGIIIIEIHGGFIHHIIIIIIIGIIFIG
##gff-version 3#!gff-spec-version 1.20##species_http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7425NC_015867.2 RefSeq cDNA_match 66086 66146 . - . ID=aln0;Target=XM_008204328.1 1 61 +; for_remapping=2;gap_count=1;num_ident=8766;num_mismatch=0;pct_coverage=100;pct_coverage_hiqual=100;pct_identity_gap=99.9886;pct_identity_ungap=100;rank=1NC_015867.2 RefSeq cDNA_match 65959 66007 . - . ID=aln0;Target=XM_008204328.1 62 110 +;for_remapping=2;gap_count=1;num_ident=8766;num_mismatch=0;pct_coverage=100;pct_coverage_hiqual=100;pct_identity_gap=99.9886;pct_identity_ungap=100;rank=1NC_015867.2 RefSeq cDNA_match 65799 65825 . - . ID=aln0;Target=XM_008204328.1 111 137 +;for_remapping=2;gap_count=1;num_ident=8766;num_mismatch=0;pct_coverage=100;pct_coverage_hiqual=100;pct_identity_gap=99.9886;pct_identity_ungap=100;rank=1
FASTQ
FASTA GFF3
![Page 35: Previous Lecture : Exploring Data](https://reader035.vdocuments.net/reader035/viewer/2022062517/568139c9550346895da177a0/html5/thumbnails/35.jpg)
Next Tutorial: Python Programming
Saturday 9/13 at 3 PM in TRB 120