monday, september 13, 2:31:58 am outline i.general course owerview. aims of the course. how is the...

35
Monday, September 13, 2:31:58 AM Outlin e I. General course owerview. Aims of the course. How is the course taught? What are the formative and summative assessments for the course? What are the essential requirements to pass the course? Which textbook does the course use? II. What is biology and bioinformatics? III. Types and characteristics of biological data. Types and numbers of databases and/or bioinformatics tools. IV. Biological preliminaries. Propertise and organization of life. Structures and fundamental roles of the DNA macromolecules. Flow of genetic information. The genetic code. How do genetic variants arise? What is the genome and terms of the three genomic paradoxes? Slide 1/1

Upload: lenard-perry

Post on 28-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Outline

I. General course owerview.• Aims of the course. How is the course taught? What are the

formative and summative assessments for the course? What are the essential requirements to pass the course? Which textbook does the course use?

II. What is biology and bioinformatics? III. Types and characteristics of biological data.

• Types and numbers of databases and/or bioinformatics tools.IV. Biological preliminaries.

• Propertise and organization of life.• Structures and fundamental roles of the DNA macromolecules.• Flow of genetic information.• The genetic code.• How do genetic variants arise? • What is the genome and terms of the three genomic paradoxes?

Slide 1/1

Page 2: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

This is an application-oriented course divided into two alternating modules: A) informatics and B) bioinformatics, presented by Dr. Gabor Pauler and Dr. Csaba Fekete respectively.

In these modules you will learn how to use standard web-based bioinformatics tools and databases. The main emphasis is placed on making it as easy as possible for the user.

The course was made with the assumption that you have rudimentary level of knowledge in biochemestry, cell and molecular biology, so in the current context we can only give an extremely brief summary (reminder) of these topics .

Welcome to the lecture/practical curse “Informatics-Bioinformatics”

Slide 1/2

Course overview, content and methodology

Page 3: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Course overview, content and methodology

Slide 1/3

If you wish to learn more about molecular biology or related disciplines, we suggest you read some of the standard textbooks mentioned in our bibliography.

The weekly laboratory coursework will closely coupled to the lecture topics. If you are confused about anything, don’t hesitate to ask your instructor.

To introduce practical use of the tools, in-class demonstrations will perform to give you a general overview of the experiment workflow; however, detailed cookbook-style instructions not will be provided.

In order to successfully complete the course you still need be present in at least 85% of the classes.

Page 4: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Course overview, content and methodology

There will be about 6 exercises (homework) throughout the course, that will combine theoretical questions and hands-on activities.

Lab cycles ran Monday to Saturday (deadline of homework). All exercises are mandatory; you should submit exercises through the course site (https://elearning.ttk.pte.hu/moodle/).

Assessments/Examinations: class participation and homework assignment are 30% of the final grade. Mid-term exam and final test 35-35% of the grade.

If you have any issues, contact me: Dr. Csaba Fekete associate professor (senior lecturer); University of Pecs, Department of General and Environmental Microbiology, 7624 Pecs, Ifjusag str. 6. Office: E330; Tel.: ++ (36)72 503-600; Extension: 4815 or 4810; e-mail: [email protected]

Slide 1/4

Page 5: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Let’s try to avoid the scholastic equivalent of this!

Slide 1/5

Page 6: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy.

What is biology?

1 Natural sciences• 1.1 Physical sciences

1.1.1 Chemistry 1.1.2 Physics 1.1.3 Astronomy 1.1.4 Earth science 1.1.5 Environmental science

• 1.2 Life sciences (Biology) Anatomy Biophysics Cell biology

Genomics Astrobiology Conservation biology Microbiology Bioinformatics Developmental biology Mycology Molecular biology Biotechnology Physiology Ecology Epidemiology Morphology Proteomics Evolution Systematic Botany Genetics Virology

2 Cognitive sciences 3 Formal sciences

3.1 Computer sciences 3.2 Mathematics 3.3 Statistics 3.4 Systems science

4 Social sciences 5 Applied sciences

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems.

Slide 2/6

Page 7: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

What is Bioinformatics?

Computer + Mouse = Bioinformatics

(Information) (Biology)

Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.

Bioinformatics is the science of managing and analyzing biological data using advanced computing techniques.

Bioinformatics ultimate goal, (as is described by an expert), is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

Bioinformatics is the computer-assisted data management discipline that helps us: acquire, store, organize, archive, analyze, integrate or visualize such data.

A marriage between Biology and Computers!

Slide 2/7

Page 8: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

History of Bioinformatics

Bioinformatics as defined scientific discipline emerged in the mid-1990s when amount of sequence, structural, and biochemical data began to accumulate.

However, the roots of bioinformatics can be traced back to the 1960s, when Margaret Dayhoff established the first database of protein sequences.

Bioinformatics was born when the first complete protein sequence (bovine insulin) was determined by Frederick Sanger.

The first DNA sequences were obtained in the early 1970s.

In 1976, Walter Fiers and his team

established the first complete genome of MS2 bacteriophage.

IBM 7090

Slide 2/8

Margaret Dayhoff (1925-1983)

Frederick Sanger(1918-)

100 proteins

Walter Fiers

Page 9: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Characteristics of biological data

Biological data has three important characteristics: (i) complexity, (ii) heterogeneity, and (iii) highly dynamic data and schema.

• Biological data is complex in the sense that it is very rich in metadata, (Metadata is defined as data providing information about one or more other pieces of

data.) and it has hierarchical structures.

• Biological data is heterogeneous in the sense that it involves a wide array of data types, including text, image, sequence data, as well as streaming data (A data stream is a sequence of digitally encoded coherent signals used to transmit or receive

information e.g., medical sensors data), temporal data, and incomplete and missing data.

• Biological data is highly dynamic, not only in content, but also in schema (i.e., structure).

Biology and Life Sciences have become increasingly “data rich” over the past decade.

Slide 3/9

Page 10: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

What units of information do we deal with

in bioinformatics?

• DNA, RNA, protein sequences: Determined order of nucleotides or amino acids.

• Graphs: Indicating relationships can be captured as graphs, as in the cases of metabolic pathways, signaling pathways, gene regulatory networks, genetic maps, and structured taxonomies.

• High-dimensional data: Used in system biology, for example, how expression profiles vary as a function of different experimental conditions .

• Geometric information: Because biological function frequently depends on relative shape of molecules, three-dimensional configuration, molecular structure data are very important.

• Scalar and vector fields: In biology, scalar and vector field properties are associated with chemical concentration, electric charge, hydrophobicity, fluxes across cell membranes, transport processes .

Biological data can be very diverse and can touch many life science domains. Each domains (subdisciplines) has its own terminology, nomenclature, rules and data needs. For instance, biological data may consist of the following:

Slide 3/10

Page 11: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

• Patterns: Within the genome are patterns that characterize biologically interesting entities such as genes, regulatory sequences. Patterns are also interesting in the exploration of protein structure data, microarray data, pathway data, proteomics data, and metabolomics data.

• Constraints: Consistency within a database is critical if the data are to be trustworthy, and biological databases are no exception.

• Images: Imagery is an important part of biological research such as electron and optical microscopy, radiographic and fluorescence images etc.

• Spatial information: Real biological entities, from cells to ecosystems, are not spatially homogeneous, and a great deal of interesting science can be found in understanding how one spatial region is different from another.

• Models: Computational models must be compared and evaluated.

• Prose: The biological literature itself can be regarded as data. Biological prose (text) is the basis for annotations, which can be regarded as a form of metadata.

• Declarative knowledge: As the complexity of various biological systems is unraveled, machine-readable representations such as hypotheses and evidence. will be necessary .

What units of information do we deal with

in bioinformatics?

Slide 3/11

Page 12: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Biological databases The instruments of bioinformatics are

computers, databases, and the statistical tools and algorithms that are used for data analysis.

Biological databases are archives of consistent data that are stored in a uniform and efficient manner. These databases contain data from a broad spectrum of molecular biology areas. Primary databases contain information and annotation of DNA and protein sequences, DNA and protein structures and DNA and protein expression profiles.

Secondary or derived databases are so called because they contain the results of analysis on the primary resources including information on sequence patterns or motifs, variants and mutations and evolutionary relationships.

Information from the literature is contained in bibliographic databases, such as Medline database of citations, abstracts and some full text articles on life sciences and biomedical topics. Slide

4/12

Page 13: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

There are many different databases and bioinformatics tools available over the Net free of charge.

The latest Molecular Biology Database Collection includes 1230 databases.

The full content of the Database Issue is available online at the Nucleic Acids Research web site.

Slide 4/13

Biological databases

Page 14: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

More then 1200 More then 1200 key databases of databases of

14 categories14 categories

http://www.oxfordjournals.org/nar/database/a/

Organization of the online database collection

http://www.oxfordjournals.org/nar/database/c/http://www.oxfordjournals.org/nar/database/cap/

Slide 4/14

Page 15: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Propertise and organization of life

Living organisms:• Are composed of cells or cell.• Are complex and ordered.• Respond to their environment.• Can grow and reproduce.• Obtain and use energy.• Maintain internal balance.• Allow for evolution adaptation.

Slide 5/15

= Smallestunit of life

Population

Community

Ecosytem

Biosphere

Page 16: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Humans appear to have an innate need to name things. In many primitive societies, a person who knows the true name of an object or of another person is believed to have power over that object or person.

Three separate but interrelated disciplines are involved in taxonomy• Identification

Characterizing organisms

• ClassificationArranging into similar

groups• Nomenclature

Naming organisms

Biologists often use a taxonomic key to identify organisms according to their characteristics.

Taxonomy

Slide 6/16

Page 17: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Building blocks from which all organisms are assembled

Slide 7/17

Page 18: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

• In prokaryotic cells (Bacteria and Archea) the DNA is not separated from the cytoplasm in a nucleus.

• There are no membrane-enclosed organelles in the cytoplasm.

• Almost all prokaryotic cells have tough external cell walls.

• Eukaryotic cells are subdivided by internal membranes into organelles.

• DNA is found mainly in the nucleus.

• Surrounding the nucleus is the cytoplasm which contains a viscous cytosol and various organelles.

Prokaryotic and eukaryotic cells can be distinguished by their structural organization. Cells in different organisms or within the same organism vary significantly in shape, size, and behavior. However, they all share common characteristics that are essential for life.

Slide 8/18

Two major kinds of cells

Page 19: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Slide 9/19

The nucleus

Nucleus contains most of the genetic material (nucleic acids) in a eukaryotic cell.

Nucleus is separated from the cytoplasm by a double membrane.

Pores in the membrane allow large macromolecules and particles to pass into the cytoplasm.

In the nucleus, the DNA and associated proteins are organized into fibrous material, chromatin.

When the cell prepares to divide, the chromatin fibers coil up to be seen as separate structures, chromosomes.

Each eukaryotic species has a characteristic number of chromosomes.

Page 20: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Major differences between pro- and eukaryotic transcription and translation

Slide 10/20

Page 21: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Chromosome an organelle for packaging DNA

Chromosomes are composed of chromatin, a complex of DNA and protein; most are about 40% DNA and 60% protein.

The proteins of chromatin fall into two classes: histones and nonhistone chromosomal proteins. Five distinct histones are known: H1, H2A, H2B, H3, and H4 .

The DNA of a chromosome is one very long, double-stranded fiber that extends unbroken through the entire length of the chromosome.

The first level of compaction is where the DNA wraps around nucleosomes.

A higher order of chromatin structure is created when the nucleosomes are wound in the fashion of a solenoid having six nucleosomes per turn.

Coiling continues until the DNA is in a compact mass.

Slide 11/21

Page 22: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Variations in structure of chromosome

Chromosomes can be broken by X-rays and by certain chemicals. The broken ends spontaneously rejoin, but if there are multiple breaks, the ends join at random. This leads to alterations in chromosome structure.

Problems with structural changes: breaking the chromosome often means breaking a gene. Since most genes are necessary for life, many chromosome breaks are lethal or cause serious defects.

Also, chromosomes with structural variations often have trouble going through meiosis, giving embryos with missing or extra large regions of the chromosomes. This condition is aneuploidy, just like the chromosome number variations, and it is often lethal.

The major categories: duplication (an extra copy of a region of chromosome), deletion (missing a region of chromosome), inversion (part of the chromosome is inserted backwards, and translocation (two different chromosomes switch pieces).

Cri du chat syndrome

Down syndrome

Edwards syndrome

Slide 11/22

Page 23: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Resuscitation: flow of genetic information

The central dogma of biology

• First it contains the templates (have the coding capacity) for the synthesis of proteins and other products for all cellular functions.

• The second role in which DNA is essential to life is as a medium to transmit information from generation to generation.

Nucleic acids (DNA, RNA) and proteins are biological macromolecules built as long linear chains of chemical components.

DNA plays a fundamental role in the processes of life in two respects.

Replication

Transcription

Translation

Slide 12/23

Page 24: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

The DNA structure5’

3’

The deoxyribonucleic acid (DNA) molecule is double-stranded and composed of two strands in an antiparallel and complementary arrangement.

The basic unit, the nucleoside, consists of a molecule of deoxyribose sugar, a phosphate group, and one of four nitrogenous bases (nucleotides), each denoted by one of the letters A, C, G and T.

DNA from any cell of all organisms should have 1:1 ratio of pyrimidine and purine bases (Chargaff’s rule). Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing A-T, G-C).

Slide 13/24

Page 25: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Structural comparison of A, B and Z DNA

Helix type A B Z

There are three natural forms of DNA (A, B and Z). The origin of these different forms are related to the conformation of the sugar and the orientation of the base relative to the sugar. The C-form and D-form are unusual subclasses of B-type.

Slide 13/25

Page 26: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Major features of the genetic code

Genetic information (code) stored in DNA.

Based upon theoretical grounds Sidney Brenner (early 1960s) postulated the genetic code (41 = 4, 42 = 16, 43 = 64) used by cells is a triplet code, which consists of a three nucleotide sequence. 20 amino acids are encoded by 61 triplets

The sequence complementary to the code is the mRNA codon.

tRNA complimentary to codon is anticodon. Codons are nonoverlapping, degenerate, there is no internal punctuation in it.

The genetic code is universal (with some exceptions…

The first two nucleotides of a codon have a higher informational value than the third one.

The code can evolve.The code can evolve. An open reading frame (ORF) is the

nucleotide sequence between a start- and a stop codon.

Slide 14/26

Page 27: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

What is the genome?

The genome is all the DNA in a cell.• All the DNA on all the chromosomes.• Includes genes, intergenic sequences,

repeats. Specifically, it is all the DNA in an organelle. Eukaryotes can have 2-3 genomes.

• Nuclear genome• Mitochondrial genome• Plastid genome

If not specified, “genome” usually refers to the nuclear genome.

In eukaryotes, this term is commonly used to refer to one complete haploid set of chromosomes, such as that found in a sperm or egg.

The units of length of nucleic acids in which genome sizes are expressed :

• Kilobase (Kb) 103 base pairs• Megabase (Mb) 106 base pairs

Slide 15/27

Page 28: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

A gene is a unit of heredity in a living organism.

Gene is a segment of DNA that is involved in producing a polypeptide chain; it can include regions preceding and following the coding DNA as well as introns between the exons; it is considered a unit of heredity; "genes were formerly called factors„.

Complex genomes have almost 10x to 30x more DNA than is required to encode all the RNAs or proteins in the organism.

Contributors to the non-coding DNA include:

• Introns in genes• Regulatory elements of genes• Multiple copies of genes, including

pseudogenes• Intergenic sequences• Interspersed repeats

An intergenic region (IGR) is a stretch of DNA sequences located between clusters of genes that contain few or no genes.

Genes are the basic unit of heredity

What is genes?

Slide 16/28

Page 29: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Distinct components in complex genomes

Slide 17/29

Page 30: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

30

Genome size

The genetic complement of a cell or virus constitutes its genome.

In eukaryotes, this term is commonly used to refer to one complete. haploid set of chromosomes, such as that found in a sperm or egg.

The C-value = the DNA content of the haploid genome.

The units of length of nucleic acids in which genome sizes are expressed :

• Kilobase (Kb) 103 base pairs• Megabase (Mb) 106 base pairs

Slide 17/30

Page 31: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Genome Size

Viral genomes are typically in the range 100–1000 kb.

• Bacteriophage MS2, one of the smallest viruses, has only four genes in a single stranded RNA molecule of about 4000 nucleotides (4kb).

Bacterial genomes are larger, typically in the range 1–10 Mb.

• The chromosome of Escherichia coli is a circular DNA molecule of 4600 kb.

Eukaryotic genomes are typically in the range 100–1000 Mb.

• Among eukaryotes, genome size often differs tremendously, even among closely related species.

Slide 17/31

Page 32: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

The 3 genomic paradoxes

The C-value = the DNA content of the haploid genome.

C-value paradox: Complexity does not correlate with genome size.

3.4 109 bpHomo sapiens

6.8 1011 bpAmoeba dubia

1.5 1010 bpAllium cepa

Slide 17/32

Page 33: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

The 3 genomic paradoxes

K-value paradox: Complexity does not correlate with chromosome number.

46 250

Ophioglossum reticulatumHomo sapiens Lysandra atlantica

~1260

Slide 17/33

Page 34: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

~21,000 genes~21,000 genes ~25,000 genes~25,000 genes ~60,000 genes~60,000 genes

The 3 genomic paradoxes

N-value paradox: Complexity does not correlate with gene number.

Slide 17/34

Page 35: Monday, September 13, 2:31:58 AM Outline I.General course owerview. Aims of the course. How is the course taught? What are the formative and summative

Monday, September 13, 2:31:58 AM

Introductory lectures: bhttp://www.youtube.com/watch?v=40Sum5KfG1Q

Texbooks

Thank you!