r. p. deolankar half knowledge is always dangerous

R. P. Deolankar

Half knowledge is always dangerous

Wet lab

A laboratory allowing for hands-on scientific research and equipped with

Appropriate plumbing Ventilation Equipment

High-throughput technologyThe technology handling high volume of data

or materialLarge-scale methods to purify, identify, and

characterize DNA, RNA, proteins and other molecules. These methods are usually automated, allowing rapid analysis of very large numbers of samples.

MicroarrayA tool used to sift through and analyze the

information contained within a genome. A microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead.

DNA microarrayA microarray of immobilized single-stranded

DNA fragments of known nucleotide sequence that is used especially in the identification and sequencing of DNA samples and in the analysis of gene expression (as in a cell or tissue)

Protein microarrayProtein microarray is a piece of glass on

which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array.

Mass spectrometryAn instrumental method for identifying the

chemical constitution of a substance by means of the separation of gaseous ions according to their differing mass and charge -- called also mass spectroscopy

Mass spectrometry: A method used to determine the masses of atoms or molecules in which an electrical charge is placed on the molecule and the resulting ions are separated by their mass to charge

Tandem mass spectrometryMultiple steps of mass spectrometry

selection, with some form of fragmentation occurring in between the stages

Immunofluorescence and immunocytochemistry, ELISA, immunoblotting

Dry lab

A laboratory for making computer simulations or for data analysis especially by computers (as in bioinformatics)—called also dry laboratory

Gene prioritizationThe results of experimental or computational

analyses in the post-genomic era (e.g., those from microarrays, proteomics, ChIP-chip, genome-wide in silico searches, genetic linkages, etc.) often consist of long lists of candidate genes. There are methods that provide score to the gene and rank them. This process is known as gene prioritization.

PhenoGOPhenoGO is a multiorganism database that

provides phenotypic context, such as the cell type, disease, and tissue and organ to existing associations between gene products and Gene Ontology (GO) terms as specified in the Gene Ontology Annotations (GOA).

BioMedLEEOne existing Natural Language Processing

(NLP) system, known as BioMedLEE, automatically extracts biological information consisting of bio-molecular substances and phenotypic data.

MeSHMedical Subject HeadingMeSH is the National Library of Medicine's

controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity.

PhenOSPhenotype Organizer System, PhenOS is a

system under development by the Lussier research group with purpose of bridging the gap between heterogeneous biomedical terminologies.

Inparanoid algorithmThe protein interaction networks of two

species are aligned by assigning proteins to sequence homology clusters using the Inparanoid algorithm

POCUSPrioritization of candidate genes using

statisticsReference: Turner FS, Clutterbuck DR,

Semple CA. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 2003;4(11):R75.

OMIMMendelian Inheritance in ManThe Online Mendelian Inheritance in Man. A

catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and provided through NCBI. The database contains information on disease phenotypes and genes, including extensive descriptions, gene names, inheritance patterns, map locations and gene polymorphisms.

TOMA web-based integrated approach for

identification of candidate disease genes, Transcriptomics of OMIM

Reference: Rossi S, Masotti D, Nardini C, Bonora E, Romeo G, Macii E, Benini L, Volinia S. TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res. 2006 Jul 1;34

Data miningData mining (sometimes called data or

knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information

Online Predicted Human Interactions Database or OPHIDDesigned to be both a resource for the

laboratory scientist to explore known and predicted protein-protein interactions, and to facilitate bioinformatics initiatives exploring protein interaction networks.

Single nucleotide polymorphisms (SNPs)A single nucleotide polymorphism (SNP,

pronounced snip), is a DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual).

Synonymous - nonsynonymous substitutionsSubstitutions that result in amino acid replacements

are said to be nonsynonymous while substitutions that do not cause an amino acid replacement (such as a GGG to GGC change - both codons still encode glycine) are said to be synonymous substitutions. Because of the difference in their effects on the physiology of the organism, synonymous and nonsynonymous substitutions can have quite different dynamics. For example, synonymous substitutions usually occur at a much faster rate than do nonsynonymous substitutions. Hence, for coding sequence it is often desirable to separate these two.

Ka/Ks valuesIn genetics, the Ka/Ks ratio or dN/dS ratio is

the ratio of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks), which can be used as an indication of selection on a protein-coding gene.

dbSNPdb (Database) of Single nucleotide

polymorphismA public-domain archive for a broad

collection of Single Nucleotide Polymorphisms (SNPs) and is hosted at the National Center for Biotechnology Information.

OrthodiseaseOrthoDisease, a comprehensive database of

model organism genes that are orthologous to human disease genes

Orthodisease is constructed primarily using Inparanoid analysis. Inparanoid is a program that automatically detects orthologs (or groups of orthologs) from 2 species

Field Biology

Biology of organisms living in their natural environments

Applications in Ecology and Evolutionary Biology

Epidemiology

Epidemiology is the study of how often disease occur in different groups of people and why

Planning and evaluating strategies to prevent illness

Guide to the management of patients in whom disease is already developed

Reference: Epidemiology for the uninitiated by Coggon, Rose and Barker

Population at risk

The population at risk is the group of people, healthy or sick, who would be counted as cases if they had the disease being studied

It defines the denominator for the calculation of rates of incidences and prevalence

It is the number of persons potentially capable of experiencing the event or outcome of interest

Floating numerator

Numerator floating without its denominatorCommon error occurring in field

investigationsThe error occurs due to the number of cases

not relating to the “at risk” populationEpidemiological conclusions (on risk) cannot

be drawn from purely clinical data (on the number of sick people seen)

Target population

It is the population about which the conclusions are to be drawn

Sometimes measurement can be made on the full target population else study samples are used

Study population and study sampleThe group of individuals in a studyIn a clinical trial, the participants make up

the study populationStudy sample is chosen from study

population

Aetiology

The study of the factors that predispose to or precipitate the disease

External agent, a susceptible host, and an environment that brings the host and agent together is a disease etiology triad

Surveillance

Watching over a population and recording data likely to have epidemiological significance, usually with the aim of early detection of disease. Essentially an interventionist exercise compared with monitoring, which is passive.

Case

Disease in populations exists as a continuum of severity rather than as an all or none phenomenon

The real question in population studies is not “has the person got the disease?” but “How much of the disease has he or she got?”

Diagnostic continuum is dichotomized into “cases” and “non-cases” on the basis of statistical, clinical, prognostic or operational options

Hence case definition should be precise and unambiguous.

Epidemiological case definitions are narrower and more rigid than clinical ones

Incidence

It is the rate at which new cases occur in a population during a specified period

(number of new cases) / (Population at risk) * (Time during which cases were ascertained)

Prevalence

Point prevalenceThe proportion of a population that are cases

at a point in timePeriod prevalenceThe proportion of a population that are cases

at any time within a stated period

Attributable risk and relative risk

Attributable risk is the disease rate in exposed persons to that in people who are unexposed

Relative risk is the ratio of the disease rate in exposed persons to that in people who are unexposed

Attributable risk = rate of disease in unexposed persons * (relative risk – 1)

Confounding

Causing confusion about causation due to 2 or more variables associated with the disease

Confounding may give rise to spurious associations when in fact there is no causal relation, or at other extreme, it may obscure the effects of a true cause

Bias

Bias is the deviation of inferences from the truth

Selection bias is the biased selection of individuals into the study

Information bias is the biased collection or biased analysis of the data

Motto of the epidemiologist could well be “dirty hands but a clean mind” (manus sordidae, mens pura)

Chance

A measure of how likely it is that some event will occur

Random, unpredictable influences on eventsThe association between the exposure and

disease is considered to be “statistically significant” if the probability that the test statistic < 0.05

Sensitivity

The proportion of persons with the disease who are correctly identified by defined criteria

The proportion of persons with the disease who are correctly identified by a screening test

The ability of a system to detect epidemics and other changes in disease occurrence

A sensitive test detects high proportion of the true cases

Specificity

The proportion of persons without a disease who are correctly identified by a test

The number of true negative results divided by the total number of all those without the disease

RandomizationRandomization is used to obtain a similar

allocation of individuals to each group, the groups are followed at the same time

Purpose of randomization: To obtain unbiased estimates of differences among treatment responses (means or effects) and to obtain an unbiased estimate of the random error variation in the experiment

Replication and Local control

Replication is the repetition of an experiment in order to test the validity of its conclusion

Local control is blocking or grouping to eliminate or to control the various sources of variation (error)

Replication and local control are necessary to achieve a reduction in the random variation among treatment effects in the experiment

Observational (non-experimental) studiesPerson-level unit of observation

1. Longitudinal measurementsa. Cohort samplesb. Case control samples

2. Cross-sectional measurementsAggregate level units of observation

(ecological studies)Reference: Epidemiology Kept Simple: An

Introduction to Traditional and Modern Epidemiology; by B. Burt Gerstman

Personal-level vs. Aggregate-levelPersonal level study on smoking might collect

information on each person’s smoking habits, age and disease status

Aggregate level of study on smoking might collect information on each region’s per capita cigarette consumption, age distribution and disease rate

Longitudinal studiesLongitudinal studies are studies in which the

sequence of events in individuals can be delineated over time

In cohort studies the incidence of disease in exposed and non-exposed groups are compared

In case-control studies people with disease (cases) and people without disease (controls) are sampled from the source population and exposure histories of cases and controls are compared

Longitudinal vs. Cross sectional studiesLongitudinal measurements relates exposures

and diseases in individuals at various time references

Cross-sectional measurements are not definitively time sequenced in individuals

In cross-sectional studies the analysis of data is gathered from samples at one point in time. Since both the outcome and the variables are measured at the one time these studies are not strong at showing cause-effect relationships.

Experimental studies

In experimental studies, the investigator introduces or removes an exposure in order to observe its influence on a health outcome. Such allocations may be based on chance mechanism (randomized trials) or on other deliberate mechanisms built into the study’s protocol (non-randomized trials)

Other disease informatics lectures:Supercourse: Epidemiology, the Internet and Global Health

Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011

r. p. deolankar half knowledge is always dangerous

Documents

differing mass

gene products

analysis of gene expression

data analysis

phenotypic data

gene ontology annotations

different molecules

rapid analysis