r. p. deolankar half knowledge is always dangerous
TRANSCRIPT
R. P. Deolankar
Half knowledge is always dangerous
Wet lab
A laboratory allowing for hands-on scientific research and equipped with
Appropriate plumbing Ventilation Equipment
High-throughput technologyThe technology handling high volume of data
or materialLarge-scale methods to purify, identify, and
characterize DNA, RNA, proteins and other molecules. These methods are usually automated, allowing rapid analysis of very large numbers of samples.
MicroarrayA tool used to sift through and analyze the
information contained within a genome. A microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead.
DNA microarrayA microarray of immobilized single-stranded
DNA fragments of known nucleotide sequence that is used especially in the identification and sequencing of DNA samples and in the analysis of gene expression (as in a cell or tissue)
Protein microarrayProtein microarray is a piece of glass on
which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array.
Mass spectrometryAn instrumental method for identifying the
chemical constitution of a substance by means of the separation of gaseous ions according to their differing mass and charge -- called also mass spectroscopy
Mass spectrometry: A method used to determine the masses of atoms or molecules in which an electrical charge is placed on the molecule and the resulting ions are separated by their mass to charge
Tandem mass spectrometryMultiple steps of mass spectrometry
selection, with some form of fragmentation occurring in between the stages
Immunofluorescence and immunocytochemistry, ELISA, immunoblotting
Dry lab
A laboratory for making computer simulations or for data analysis especially by computers (as in bioinformatics)—called also dry laboratory
Gene prioritizationThe results of experimental or computational
analyses in the post-genomic era (e.g., those from microarrays, proteomics, ChIP-chip, genome-wide in silico searches, genetic linkages, etc.) often consist of long lists of candidate genes. There are methods that provide score to the gene and rank them. This process is known as gene prioritization.
PhenoGOPhenoGO is a multiorganism database that
provides phenotypic context, such as the cell type, disease, and tissue and organ to existing associations between gene products and Gene Ontology (GO) terms as specified in the Gene Ontology Annotations (GOA).
BioMedLEEOne existing Natural Language Processing
(NLP) system, known as BioMedLEE, automatically extracts biological information consisting of bio-molecular substances and phenotypic data.
MeSHMedical Subject HeadingMeSH is the National Library of Medicine's
controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity.
PhenOSPhenotype Organizer System, PhenOS is a
system under development by the Lussier research group with purpose of bridging the gap between heterogeneous biomedical terminologies.
Inparanoid algorithmThe protein interaction networks of two
species are aligned by assigning proteins to sequence homology clusters using the Inparanoid algorithm
POCUSPrioritization of candidate genes using
statisticsReference: Turner FS, Clutterbuck DR,
Semple CA. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 2003;4(11):R75.
OMIMMendelian Inheritance in ManThe Online Mendelian Inheritance in Man. A
catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and provided through NCBI. The database contains information on disease phenotypes and genes, including extensive descriptions, gene names, inheritance patterns, map locations and gene polymorphisms.
TOMA web-based integrated approach for
identification of candidate disease genes, Transcriptomics of OMIM
Reference: Rossi S, Masotti D, Nardini C, Bonora E, Romeo G, Macii E, Benini L, Volinia S. TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res. 2006 Jul 1;34
Data miningData mining (sometimes called data or
knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information
Online Predicted Human Interactions Database or OPHIDDesigned to be both a resource for the
laboratory scientist to explore known and predicted protein-protein interactions, and to facilitate bioinformatics initiatives exploring protein interaction networks.
Single nucleotide polymorphisms (SNPs)A single nucleotide polymorphism (SNP,
pronounced snip), is a DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual).
Synonymous - nonsynonymous substitutionsSubstitutions that result in amino acid replacements
are said to be nonsynonymous while substitutions that do not cause an amino acid replacement (such as a GGG to GGC change - both codons still encode glycine) are said to be synonymous substitutions. Because of the difference in their effects on the physiology of the organism, synonymous and nonsynonymous substitutions can have quite different dynamics. For example, synonymous substitutions usually occur at a much faster rate than do nonsynonymous substitutions. Hence, for coding sequence it is often desirable to separate these two.
Ka/Ks valuesIn genetics, the Ka/Ks ratio or dN/dS ratio is
the ratio of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks), which can be used as an indication of selection on a protein-coding gene.
dbSNPdb (Database) of Single nucleotide
polymorphismA public-domain archive for a broad
collection of Single Nucleotide Polymorphisms (SNPs) and is hosted at the National Center for Biotechnology Information.
OrthodiseaseOrthoDisease, a comprehensive database of
model organism genes that are orthologous to human disease genes
Orthodisease is constructed primarily using Inparanoid analysis. Inparanoid is a program that automatically detects orthologs (or groups of orthologs) from 2 species
Field Biology
Biology of organisms living in their natural environments
Applications in Ecology and Evolutionary Biology
Epidemiology
Epidemiology is the study of how often disease occur in different groups of people and why
Planning and evaluating strategies to prevent illness
Guide to the management of patients in whom disease is already developed
Reference: Epidemiology for the uninitiated by Coggon, Rose and Barker
Population at risk
The population at risk is the group of people, healthy or sick, who would be counted as cases if they had the disease being studied
It defines the denominator for the calculation of rates of incidences and prevalence
It is the number of persons potentially capable of experiencing the event or outcome of interest
Floating numerator
Numerator floating without its denominatorCommon error occurring in field
investigationsThe error occurs due to the number of cases
not relating to the “at risk” populationEpidemiological conclusions (on risk) cannot
be drawn from purely clinical data (on the number of sick people seen)
Target population
It is the population about which the conclusions are to be drawn
Sometimes measurement can be made on the full target population else study samples are used
Study population and study sampleThe group of individuals in a studyIn a clinical trial, the participants make up
the study populationStudy sample is chosen from study
population
Aetiology
The study of the factors that predispose to or precipitate the disease
External agent, a susceptible host, and an environment that brings the host and agent together is a disease etiology triad
Surveillance
Watching over a population and recording data likely to have epidemiological significance, usually with the aim of early detection of disease. Essentially an interventionist exercise compared with monitoring, which is passive.
Case
Disease in populations exists as a continuum of severity rather than as an all or none phenomenon
The real question in population studies is not “has the person got the disease?” but “How much of the disease has he or she got?”
Diagnostic continuum is dichotomized into “cases” and “non-cases” on the basis of statistical, clinical, prognostic or operational options
Hence case definition should be precise and unambiguous.
Epidemiological case definitions are narrower and more rigid than clinical ones
Incidence
It is the rate at which new cases occur in a population during a specified period
(number of new cases) / (Population at risk) * (Time during which cases were ascertained)
Prevalence
Point prevalenceThe proportion of a population that are cases
at a point in timePeriod prevalenceThe proportion of a population that are cases
at any time within a stated period
Attributable risk and relative risk
Attributable risk is the disease rate in exposed persons to that in people who are unexposed
Relative risk is the ratio of the disease rate in exposed persons to that in people who are unexposed
Attributable risk = rate of disease in unexposed persons * (relative risk – 1)
Confounding
Causing confusion about causation due to 2 or more variables associated with the disease
Confounding may give rise to spurious associations when in fact there is no causal relation, or at other extreme, it may obscure the effects of a true cause
Bias
Bias is the deviation of inferences from the truth
Selection bias is the biased selection of individuals into the study
Information bias is the biased collection or biased analysis of the data
Motto of the epidemiologist could well be “dirty hands but a clean mind” (manus sordidae, mens pura)
Chance
A measure of how likely it is that some event will occur
Random, unpredictable influences on eventsThe association between the exposure and
disease is considered to be “statistically significant” if the probability that the test statistic < 0.05
Sensitivity
The proportion of persons with the disease who are correctly identified by defined criteria
The proportion of persons with the disease who are correctly identified by a screening test
The ability of a system to detect epidemics and other changes in disease occurrence
A sensitive test detects high proportion of the true cases
Specificity
The proportion of persons without a disease who are correctly identified by a test
The number of true negative results divided by the total number of all those without the disease
RandomizationRandomization is used to obtain a similar
allocation of individuals to each group, the groups are followed at the same time
Purpose of randomization: To obtain unbiased estimates of differences among treatment responses (means or effects) and to obtain an unbiased estimate of the random error variation in the experiment
Replication and Local control
Replication is the repetition of an experiment in order to test the validity of its conclusion
Local control is blocking or grouping to eliminate or to control the various sources of variation (error)
Replication and local control are necessary to achieve a reduction in the random variation among treatment effects in the experiment
Observational (non-experimental) studiesPerson-level unit of observation
1. Longitudinal measurementsa. Cohort samplesb. Case control samples
2. Cross-sectional measurementsAggregate level units of observation
(ecological studies)Reference: Epidemiology Kept Simple: An
Introduction to Traditional and Modern Epidemiology; by B. Burt Gerstman
Personal-level vs. Aggregate-levelPersonal level study on smoking might collect
information on each person’s smoking habits, age and disease status
Aggregate level of study on smoking might collect information on each region’s per capita cigarette consumption, age distribution and disease rate
Longitudinal studiesLongitudinal studies are studies in which the
sequence of events in individuals can be delineated over time
In cohort studies the incidence of disease in exposed and non-exposed groups are compared
In case-control studies people with disease (cases) and people without disease (controls) are sampled from the source population and exposure histories of cases and controls are compared
Longitudinal vs. Cross sectional studiesLongitudinal measurements relates exposures
and diseases in individuals at various time references
Cross-sectional measurements are not definitively time sequenced in individuals
In cross-sectional studies the analysis of data is gathered from samples at one point in time. Since both the outcome and the variables are measured at the one time these studies are not strong at showing cause-effect relationships.
Experimental studies
In experimental studies, the investigator introduces or removes an exposure in order to observe its influence on a health outcome. Such allocations may be based on chance mechanism (randomized trials) or on other deliberate mechanisms built into the study’s protocol (non-randomized trials)
Other disease informatics lectures:Supercourse: Epidemiology, the Internet and Global Health
Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011