monday, september 13, 2:31:58 am outline i.general course owerview. aims of the course. how is the...
TRANSCRIPT
Monday, September 13, 2:31:58 AM
Outline
I. General course owerview.• Aims of the course. How is the course taught? What are the
formative and summative assessments for the course? What are the essential requirements to pass the course? Which textbook does the course use?
II. What is biology and bioinformatics? III. Types and characteristics of biological data.
• Types and numbers of databases and/or bioinformatics tools.IV. Biological preliminaries.
• Propertise and organization of life.• Structures and fundamental roles of the DNA macromolecules.• Flow of genetic information.• The genetic code.• How do genetic variants arise? • What is the genome and terms of the three genomic paradoxes?
Slide 1/1
Monday, September 13, 2:31:58 AM
This is an application-oriented course divided into two alternating modules: A) informatics and B) bioinformatics, presented by Dr. Gabor Pauler and Dr. Csaba Fekete respectively.
In these modules you will learn how to use standard web-based bioinformatics tools and databases. The main emphasis is placed on making it as easy as possible for the user.
The course was made with the assumption that you have rudimentary level of knowledge in biochemestry, cell and molecular biology, so in the current context we can only give an extremely brief summary (reminder) of these topics .
Welcome to the lecture/practical curse “Informatics-Bioinformatics”
Slide 1/2
Course overview, content and methodology
Monday, September 13, 2:31:58 AM
Course overview, content and methodology
Slide 1/3
If you wish to learn more about molecular biology or related disciplines, we suggest you read some of the standard textbooks mentioned in our bibliography.
The weekly laboratory coursework will closely coupled to the lecture topics. If you are confused about anything, don’t hesitate to ask your instructor.
To introduce practical use of the tools, in-class demonstrations will perform to give you a general overview of the experiment workflow; however, detailed cookbook-style instructions not will be provided.
In order to successfully complete the course you still need be present in at least 85% of the classes.
Monday, September 13, 2:31:58 AM
Course overview, content and methodology
There will be about 6 exercises (homework) throughout the course, that will combine theoretical questions and hands-on activities.
Lab cycles ran Monday to Saturday (deadline of homework). All exercises are mandatory; you should submit exercises through the course site (https://elearning.ttk.pte.hu/moodle/).
Assessments/Examinations: class participation and homework assignment are 30% of the final grade. Mid-term exam and final test 35-35% of the grade.
If you have any issues, contact me: Dr. Csaba Fekete associate professor (senior lecturer); University of Pecs, Department of General and Environmental Microbiology, 7624 Pecs, Ifjusag str. 6. Office: E330; Tel.: ++ (36)72 503-600; Extension: 4815 or 4810; e-mail: [email protected]
Slide 1/4
Monday, September 13, 2:31:58 AM
Let’s try to avoid the scholastic equivalent of this!
Slide 1/5
Monday, September 13, 2:31:58 AM
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy.
What is biology?
1 Natural sciences• 1.1 Physical sciences
1.1.1 Chemistry 1.1.2 Physics 1.1.3 Astronomy 1.1.4 Earth science 1.1.5 Environmental science
• 1.2 Life sciences (Biology) Anatomy Biophysics Cell biology
Genomics Astrobiology Conservation biology Microbiology Bioinformatics Developmental biology Mycology Molecular biology Biotechnology Physiology Ecology Epidemiology Morphology Proteomics Evolution Systematic Botany Genetics Virology
2 Cognitive sciences 3 Formal sciences
3.1 Computer sciences 3.2 Mathematics 3.3 Statistics 3.4 Systems science
4 Social sciences 5 Applied sciences
Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems.
Slide 2/6
Monday, September 13, 2:31:58 AM
What is Bioinformatics?
Computer + Mouse = Bioinformatics
(Information) (Biology)
Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.
Bioinformatics is the science of managing and analyzing biological data using advanced computing techniques.
Bioinformatics ultimate goal, (as is described by an expert), is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
Bioinformatics is the computer-assisted data management discipline that helps us: acquire, store, organize, archive, analyze, integrate or visualize such data.
A marriage between Biology and Computers!
Slide 2/7
Monday, September 13, 2:31:58 AM
History of Bioinformatics
Bioinformatics as defined scientific discipline emerged in the mid-1990s when amount of sequence, structural, and biochemical data began to accumulate.
However, the roots of bioinformatics can be traced back to the 1960s, when Margaret Dayhoff established the first database of protein sequences.
Bioinformatics was born when the first complete protein sequence (bovine insulin) was determined by Frederick Sanger.
The first DNA sequences were obtained in the early 1970s.
In 1976, Walter Fiers and his team
established the first complete genome of MS2 bacteriophage.
IBM 7090
Slide 2/8
Margaret Dayhoff (1925-1983)
Frederick Sanger(1918-)
100 proteins
Walter Fiers
Monday, September 13, 2:31:58 AM
Characteristics of biological data
Biological data has three important characteristics: (i) complexity, (ii) heterogeneity, and (iii) highly dynamic data and schema.
• Biological data is complex in the sense that it is very rich in metadata, (Metadata is defined as data providing information about one or more other pieces of
data.) and it has hierarchical structures.
• Biological data is heterogeneous in the sense that it involves a wide array of data types, including text, image, sequence data, as well as streaming data (A data stream is a sequence of digitally encoded coherent signals used to transmit or receive
information e.g., medical sensors data), temporal data, and incomplete and missing data.
• Biological data is highly dynamic, not only in content, but also in schema (i.e., structure).
Biology and Life Sciences have become increasingly “data rich” over the past decade.
Slide 3/9
Monday, September 13, 2:31:58 AM
What units of information do we deal with
in bioinformatics?
• DNA, RNA, protein sequences: Determined order of nucleotides or amino acids.
• Graphs: Indicating relationships can be captured as graphs, as in the cases of metabolic pathways, signaling pathways, gene regulatory networks, genetic maps, and structured taxonomies.
• High-dimensional data: Used in system biology, for example, how expression profiles vary as a function of different experimental conditions .
• Geometric information: Because biological function frequently depends on relative shape of molecules, three-dimensional configuration, molecular structure data are very important.
• Scalar and vector fields: In biology, scalar and vector field properties are associated with chemical concentration, electric charge, hydrophobicity, fluxes across cell membranes, transport processes .
Biological data can be very diverse and can touch many life science domains. Each domains (subdisciplines) has its own terminology, nomenclature, rules and data needs. For instance, biological data may consist of the following:
Slide 3/10
Monday, September 13, 2:31:58 AM
• Patterns: Within the genome are patterns that characterize biologically interesting entities such as genes, regulatory sequences. Patterns are also interesting in the exploration of protein structure data, microarray data, pathway data, proteomics data, and metabolomics data.
• Constraints: Consistency within a database is critical if the data are to be trustworthy, and biological databases are no exception.
• Images: Imagery is an important part of biological research such as electron and optical microscopy, radiographic and fluorescence images etc.
• Spatial information: Real biological entities, from cells to ecosystems, are not spatially homogeneous, and a great deal of interesting science can be found in understanding how one spatial region is different from another.
• Models: Computational models must be compared and evaluated.
• Prose: The biological literature itself can be regarded as data. Biological prose (text) is the basis for annotations, which can be regarded as a form of metadata.
• Declarative knowledge: As the complexity of various biological systems is unraveled, machine-readable representations such as hypotheses and evidence. will be necessary .
What units of information do we deal with
in bioinformatics?
Slide 3/11
Monday, September 13, 2:31:58 AM
Biological databases The instruments of bioinformatics are
computers, databases, and the statistical tools and algorithms that are used for data analysis.
Biological databases are archives of consistent data that are stored in a uniform and efficient manner. These databases contain data from a broad spectrum of molecular biology areas. Primary databases contain information and annotation of DNA and protein sequences, DNA and protein structures and DNA and protein expression profiles.
Secondary or derived databases are so called because they contain the results of analysis on the primary resources including information on sequence patterns or motifs, variants and mutations and evolutionary relationships.
Information from the literature is contained in bibliographic databases, such as Medline database of citations, abstracts and some full text articles on life sciences and biomedical topics. Slide
4/12
Monday, September 13, 2:31:58 AM
There are many different databases and bioinformatics tools available over the Net free of charge.
The latest Molecular Biology Database Collection includes 1230 databases.
The full content of the Database Issue is available online at the Nucleic Acids Research web site.
Slide 4/13
Biological databases
Monday, September 13, 2:31:58 AM
More then 1200 More then 1200 key databases of databases of
14 categories14 categories
http://www.oxfordjournals.org/nar/database/a/
Organization of the online database collection
http://www.oxfordjournals.org/nar/database/c/http://www.oxfordjournals.org/nar/database/cap/
Slide 4/14
Monday, September 13, 2:31:58 AM
Propertise and organization of life
Living organisms:• Are composed of cells or cell.• Are complex and ordered.• Respond to their environment.• Can grow and reproduce.• Obtain and use energy.• Maintain internal balance.• Allow for evolution adaptation.
Slide 5/15
= Smallestunit of life
Population
Community
Ecosytem
Biosphere
Monday, September 13, 2:31:58 AM
Humans appear to have an innate need to name things. In many primitive societies, a person who knows the true name of an object or of another person is believed to have power over that object or person.
Three separate but interrelated disciplines are involved in taxonomy• Identification
Characterizing organisms
• ClassificationArranging into similar
groups• Nomenclature
Naming organisms
Biologists often use a taxonomic key to identify organisms according to their characteristics.
Taxonomy
Slide 6/16
Monday, September 13, 2:31:58 AM
Building blocks from which all organisms are assembled
Slide 7/17
Monday, September 13, 2:31:58 AM
• In prokaryotic cells (Bacteria and Archea) the DNA is not separated from the cytoplasm in a nucleus.
• There are no membrane-enclosed organelles in the cytoplasm.
• Almost all prokaryotic cells have tough external cell walls.
• Eukaryotic cells are subdivided by internal membranes into organelles.
• DNA is found mainly in the nucleus.
• Surrounding the nucleus is the cytoplasm which contains a viscous cytosol and various organelles.
Prokaryotic and eukaryotic cells can be distinguished by their structural organization. Cells in different organisms or within the same organism vary significantly in shape, size, and behavior. However, they all share common characteristics that are essential for life.
Slide 8/18
Two major kinds of cells
Monday, September 13, 2:31:58 AM
Slide 9/19
The nucleus
Nucleus contains most of the genetic material (nucleic acids) in a eukaryotic cell.
Nucleus is separated from the cytoplasm by a double membrane.
Pores in the membrane allow large macromolecules and particles to pass into the cytoplasm.
In the nucleus, the DNA and associated proteins are organized into fibrous material, chromatin.
When the cell prepares to divide, the chromatin fibers coil up to be seen as separate structures, chromosomes.
Each eukaryotic species has a characteristic number of chromosomes.
Monday, September 13, 2:31:58 AM
Major differences between pro- and eukaryotic transcription and translation
Slide 10/20
Monday, September 13, 2:31:58 AM
Chromosome an organelle for packaging DNA
Chromosomes are composed of chromatin, a complex of DNA and protein; most are about 40% DNA and 60% protein.
The proteins of chromatin fall into two classes: histones and nonhistone chromosomal proteins. Five distinct histones are known: H1, H2A, H2B, H3, and H4 .
The DNA of a chromosome is one very long, double-stranded fiber that extends unbroken through the entire length of the chromosome.
The first level of compaction is where the DNA wraps around nucleosomes.
A higher order of chromatin structure is created when the nucleosomes are wound in the fashion of a solenoid having six nucleosomes per turn.
Coiling continues until the DNA is in a compact mass.
Slide 11/21
Monday, September 13, 2:31:58 AM
Variations in structure of chromosome
Chromosomes can be broken by X-rays and by certain chemicals. The broken ends spontaneously rejoin, but if there are multiple breaks, the ends join at random. This leads to alterations in chromosome structure.
Problems with structural changes: breaking the chromosome often means breaking a gene. Since most genes are necessary for life, many chromosome breaks are lethal or cause serious defects.
Also, chromosomes with structural variations often have trouble going through meiosis, giving embryos with missing or extra large regions of the chromosomes. This condition is aneuploidy, just like the chromosome number variations, and it is often lethal.
The major categories: duplication (an extra copy of a region of chromosome), deletion (missing a region of chromosome), inversion (part of the chromosome is inserted backwards, and translocation (two different chromosomes switch pieces).
Cri du chat syndrome
Down syndrome
Edwards syndrome
Slide 11/22
Monday, September 13, 2:31:58 AM
Resuscitation: flow of genetic information
The central dogma of biology
• First it contains the templates (have the coding capacity) for the synthesis of proteins and other products for all cellular functions.
• The second role in which DNA is essential to life is as a medium to transmit information from generation to generation.
Nucleic acids (DNA, RNA) and proteins are biological macromolecules built as long linear chains of chemical components.
DNA plays a fundamental role in the processes of life in two respects.
Replication
Transcription
Translation
Slide 12/23
Monday, September 13, 2:31:58 AM
The DNA structure5’
3’
The deoxyribonucleic acid (DNA) molecule is double-stranded and composed of two strands in an antiparallel and complementary arrangement.
The basic unit, the nucleoside, consists of a molecule of deoxyribose sugar, a phosphate group, and one of four nitrogenous bases (nucleotides), each denoted by one of the letters A, C, G and T.
DNA from any cell of all organisms should have 1:1 ratio of pyrimidine and purine bases (Chargaff’s rule). Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing A-T, G-C).
Slide 13/24
Monday, September 13, 2:31:58 AM
Structural comparison of A, B and Z DNA
Helix type A B Z
There are three natural forms of DNA (A, B and Z). The origin of these different forms are related to the conformation of the sugar and the orientation of the base relative to the sugar. The C-form and D-form are unusual subclasses of B-type.
Slide 13/25
Monday, September 13, 2:31:58 AM
Major features of the genetic code
Genetic information (code) stored in DNA.
Based upon theoretical grounds Sidney Brenner (early 1960s) postulated the genetic code (41 = 4, 42 = 16, 43 = 64) used by cells is a triplet code, which consists of a three nucleotide sequence. 20 amino acids are encoded by 61 triplets
The sequence complementary to the code is the mRNA codon.
tRNA complimentary to codon is anticodon. Codons are nonoverlapping, degenerate, there is no internal punctuation in it.
The genetic code is universal (with some exceptions…
The first two nucleotides of a codon have a higher informational value than the third one.
The code can evolve.The code can evolve. An open reading frame (ORF) is the
nucleotide sequence between a start- and a stop codon.
Slide 14/26
Monday, September 13, 2:31:58 AM
What is the genome?
The genome is all the DNA in a cell.• All the DNA on all the chromosomes.• Includes genes, intergenic sequences,
repeats. Specifically, it is all the DNA in an organelle. Eukaryotes can have 2-3 genomes.
• Nuclear genome• Mitochondrial genome• Plastid genome
If not specified, “genome” usually refers to the nuclear genome.
In eukaryotes, this term is commonly used to refer to one complete haploid set of chromosomes, such as that found in a sperm or egg.
The units of length of nucleic acids in which genome sizes are expressed :
• Kilobase (Kb) 103 base pairs• Megabase (Mb) 106 base pairs
Slide 15/27
Monday, September 13, 2:31:58 AM
A gene is a unit of heredity in a living organism.
Gene is a segment of DNA that is involved in producing a polypeptide chain; it can include regions preceding and following the coding DNA as well as introns between the exons; it is considered a unit of heredity; "genes were formerly called factors„.
Complex genomes have almost 10x to 30x more DNA than is required to encode all the RNAs or proteins in the organism.
Contributors to the non-coding DNA include:
• Introns in genes• Regulatory elements of genes• Multiple copies of genes, including
pseudogenes• Intergenic sequences• Interspersed repeats
An intergenic region (IGR) is a stretch of DNA sequences located between clusters of genes that contain few or no genes.
Genes are the basic unit of heredity
What is genes?
Slide 16/28
Monday, September 13, 2:31:58 AM
Distinct components in complex genomes
Slide 17/29
Monday, September 13, 2:31:58 AM
30
Genome size
The genetic complement of a cell or virus constitutes its genome.
In eukaryotes, this term is commonly used to refer to one complete. haploid set of chromosomes, such as that found in a sperm or egg.
The C-value = the DNA content of the haploid genome.
The units of length of nucleic acids in which genome sizes are expressed :
• Kilobase (Kb) 103 base pairs• Megabase (Mb) 106 base pairs
Slide 17/30
Monday, September 13, 2:31:58 AM
Genome Size
Viral genomes are typically in the range 100–1000 kb.
• Bacteriophage MS2, one of the smallest viruses, has only four genes in a single stranded RNA molecule of about 4000 nucleotides (4kb).
Bacterial genomes are larger, typically in the range 1–10 Mb.
• The chromosome of Escherichia coli is a circular DNA molecule of 4600 kb.
Eukaryotic genomes are typically in the range 100–1000 Mb.
• Among eukaryotes, genome size often differs tremendously, even among closely related species.
Slide 17/31
Monday, September 13, 2:31:58 AM
The 3 genomic paradoxes
The C-value = the DNA content of the haploid genome.
C-value paradox: Complexity does not correlate with genome size.
3.4 109 bpHomo sapiens
6.8 1011 bpAmoeba dubia
1.5 1010 bpAllium cepa
Slide 17/32
Monday, September 13, 2:31:58 AM
The 3 genomic paradoxes
K-value paradox: Complexity does not correlate with chromosome number.
46 250
Ophioglossum reticulatumHomo sapiens Lysandra atlantica
~1260
Slide 17/33
Monday, September 13, 2:31:58 AM
~21,000 genes~21,000 genes ~25,000 genes~25,000 genes ~60,000 genes~60,000 genes
The 3 genomic paradoxes
N-value paradox: Complexity does not correlate with gene number.
Slide 17/34
Monday, September 13, 2:31:58 AM
Introductory lectures: bhttp://www.youtube.com/watch?v=40Sum5KfG1Q
Texbooks
Thank you!