中草藥生物資訊 與程式設計 - d'trendsintroduction to bioinformatics. stephen a....

42
中草藥生物資訊 與程式設計 謝長奇 http://mail.cmu.edu.tw/~cchsieh/ [email protected] 04-23590121#37125

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

中草藥生物資訊與程式設計

謝長奇http://mail.cmu.edu.tw/~cchsieh/

[email protected]#37125

Page 2: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

授課資料

�教學目標

�教導學生于生物資訊學,並指導學生搜尋生物資訊、利用生物資訊相關軟體與建構之生物資訊程式與計算平台。

�授課方式

�電腦教室上機實習及課堂教學

�教學大樓5F視聽電腦教室

Page 3: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

授課資料

� 評分標準� 出席狀況:10 %、上課態度:20 %、學習成果(隨堂考、期

中考、期末考):70 % � 參考書目

� Introduction to Bioinformatics. Stephen A. Krawetz and David D. Womble 2003. Humana Press

� 基礎生物資訊實務李炎編著2002 藝軒出版社� 生物資訊入門陳進和等編譯2003 藝軒出版社� Beginning Perl for Bioinformatics. James D. Tisdall. 2001.

O'Reilly Press � Mastering Perl for Bioinformatics. James D. Tisdall. 2003.

O'Reilly Press� 生物資訊學電腦技術林仲彥、李士傑、陳淑華、OSB-TW

2002歐萊禮

Page 4: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

基礎生物資訊實務(附光碟)(第二版)

� 本書主要以分子生物學的觀念,介紹如何利用網路上免費的生物資訊工具。

� 如何將片段DNA查出它的可能蛋白質為何?此蛋白質的可能3D結構為何?

� 或是它是否含有“表現段-exon”或“插入段intron”?

� 有否含tRNA、mRNA或rRNA片段?� 幾個片段的胺基酸排序之間有否演化上的關連性?

� 並繪出演化樹。

Page 5: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

課程內容

Med-Exam9

Statistical Analysis8

Genome Information Resources7

The Evolution from Sequence Information6

Phylogenic Analysis5

BLAST and Sequence alignment4

Nucleic Acid Sequence Analysis and Information3

NCBI and Other biomedical website2

Introduction1

教 學 內 容次數

Page 6: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

課程內容

Final exam18

Perl programming in ClastalW17

Perl programming in BLAST16

Perl programming in Genbank15

Perl programming in restriction map and virtual gel14

Perl programming in sequence analysis13

Perl programming in sequence search12

Perl programming in bioscience11

Perl programming introduction10

教 學 內 容次數

Page 7: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

課程關連圖

中草藥資源

中草藥生物資訊

生物化學

分析化學

藥理學

分子生物學

生物技術

體學(組學)

藥理、毒理模型預測

中草藥系統演化

藥材資源鑑定

新資源、新產品開發

多元市場經營推廣

Page 8: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

課程學習路徑圖

Database

DNA proteinRNA

Analysis tools

Sequence EvolutionPhylogenic

Bioinformation

Knowledge

Public domainIn house design

Page 9: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Bioinformatics

introduction

Page 10: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

生物資訊學生物資訊學生物資訊學生物資訊學

� 利用應用數學、資訊學、統計學和電腦科學的方法研究生物學的問題。

� 目前的生物資訊學生物資訊學生物資訊學生物資訊學基本上只是分子生物學與資訊技術(尤其是網際網路技術)的結合體。

� 生物資訊學的研究材料和結果就是各種各樣的生物學數據,其研究工具是電腦,研究方法包括對生物學數據的搜索(收集和篩選)、處理(編輯、整理、管理和顯示)及利用(計算、模擬)。

� 目前主要的研究方向有:� 序列比對,基因識別,基因重組,蛋白質結構預測,基因表達,蛋白質反應的預測,以及建立進化論的模型。

Page 11: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

生物資訊學生物資訊學生物資訊學生物資訊學

�生物學技術往往生成大量的嘈雜數據。

�與數據挖掘類似,生物資訊學利用數學工具從大量數據中提取有用的生物學資訊。

�生物資訊學所要處理的典型問題包括:

�重新組裝在散彈法DNA測序過程中被打散的DNA序列,從蛋白質的胺基酸序列預測蛋白質結構,利用mRNA微陣列或質譜儀的數據檢驗基因調控的假說。

Page 12: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

What is bioinformatics?

� http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

� http://www.bioinformatics.vg/what-is-bioinformatics.shtml

� http://www.biology.gatech.edu/graduate-programs/bioinformatics/new/whatis.php

� http://bioinformatics.org/faq/� http://www.bioinfo-online.net/modules/faq/

Page 13: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

What is bioinformatics

� The term was first coined in 1988 by Dr. Hwa A. Lim (HAL), and is commonly known as the “Father of Bioinformatics”.

� The original definition was :� “a collective term for data

compilation, organisation, analysis and dissemination”

Page 14: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

What is bioinformatics

� The use of computers in solving information problems in the life sciences, mainly, it involves the creation of extensive electronic databases on genomes, protein sequences, etc.

� It involves techniques such as the three-dimensional modeling of biomoleculesand biologic systems.

Page 15: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Why study bioinformatics

� variety sources of data� make the data easily and universally

interpretable by scientists� post-genomic era

Page 16: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Progressing history

� Mendel proved his laws of hereditary with varieties of peas and flowers in 1865

� The The The The first protein to be sequenced first protein to be sequenced first protein to be sequenced first protein to be sequenced – insulininsulininsulininsulin� The The The The first complete sequencing of an enzyme, first complete sequencing of an enzyme, first complete sequencing of an enzyme, first complete sequencing of an enzyme,

ribonucleaseribonucleaseribonucleaseribonuclease in 1960in 1960in 1960in 1960� To the sequencing of the first complete genome To the sequencing of the first complete genome To the sequencing of the first complete genome To the sequencing of the first complete genome

((((HaemophilusHaemophilusHaemophilusHaemophilus influenzaeinfluenzaeinfluenzaeinfluenzae) published in 1995) published in 1995) published in 1995) published in 1995� moved on to technologies permitting the

sequencing, recombination and cloning of DNA

Page 17: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

The Human Genome Project

� In 1990 the unveiling of the Human Genome In 1990 the unveiling of the Human Genome In 1990 the unveiling of the Human Genome In 1990 the unveiling of the Human Genome Project (HGP) by the United StatesProject (HGP) by the United StatesProject (HGP) by the United StatesProject (HGP) by the United StatesDepartment of Energy (DoE) and the National Institutes of Health

� Goals:Goals:Goals:Goals: to identify all chemical base pairs and all genes that make up the 23 chromosome pairs found in human DNA

� “To develop the next generation of methods To develop the next generation of methods To develop the next generation of methods To develop the next generation of methods for simulating cellular behaviour and pathwaysfor simulating cellular behaviour and pathwaysfor simulating cellular behaviour and pathwaysfor simulating cellular behaviour and pathways”

Page 18: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

The Human Genome Project

� Collaboration of 20 groups across the world� The results would be free and data release The results would be free and data release The results would be free and data release The results would be free and data release

would be rapidwould be rapidwould be rapidwould be rapid� To identify every base pair in the genome -

there are 3 x 109

� To assign genes and what they code for or not

� Potentially revolutionise biomedical research

Page 19: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

The Human Genome Project

� The initial estimates were that the human The initial estimates were that the human The initial estimates were that the human The initial estimates were that the human genome comprised some 100,000+ genes genome comprised some 100,000+ genes genome comprised some 100,000+ genes genome comprised some 100,000+ genes ----now we know there are only 30now we know there are only 30now we know there are only 30now we know there are only 30----40,00040,00040,00040,000

� only twice as many as found in a worm or a only twice as many as found in a worm or a only twice as many as found in a worm or a only twice as many as found in a worm or a flyflyflyfly

Page 20: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Gene

� Open Reading Frames

The complex splicing techniques of higher organisms means each

protein-coding gene generates between 3 and 6 proteins = 50,000 to

500,000 proteins per individual

Page 21: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Gene

� The idea that 1 gene = 1 protein is clearly wrongwrongwrongwrong

� The gene structure and the components that regulate its expression must be much more complex than previously thought

� Still have roughly 100,000 genes of microbes, plants and animals whose functions are still to be revealed

Page 22: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Systemic biology

� Human� ~40,000 genes� ~100,000-150,000 splice variants� ~500,000-2,000,000 proteins (polypeptides)� ~600-1000 metabolic pathways

Page 23: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Systemic biology

� Molecular interaction complexity

Page 24: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Molecular Interaction Maps

http://discover.nci.nih.gov/mim/index.jsp

Page 25: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Systemic biology

� Evolutionary complexity� elimination series

http://www.ucmp.berkeley.edu/exhibit/phylogeny.html

Page 26: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

The ‘omics’ revolution

� GenomicsGenomicsGenomicsGenomics – the sequencing and annotation of genomes

� Functional and structural genomicsFunctional and structural genomicsFunctional and structural genomicsFunctional and structural genomics – the comparison and characterisation of genomes of different species

� ProteomicsProteomicsProteomicsProteomics – the description of the complete set of proteins a particular genome codes

Page 27: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Model organisms

� Cheap� Plentiful� Short generation times� Easily manipulated � Test novel drug candidates� Illustrating which genes, and therefore which

proteins, are responsible for which phenotype/disease

� 85% genetic similarity between the mouse and human genome

Page 28: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Computer science come into

� The HGP has brought to light the limitations of traditional lab work

� Bioinformaticists act to bridge the gap between the data stored and its biological significance

Page 29: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Bioinformatics Tools

� GenBank - genes, proteins, genomes� Similarity Search tools: BLAST, VAST� Alignment: CLUSTAL� Protein families: Pfam, ProDom� Protein Structures: PDB, RasMol� Whole Genomes: UCSC, Entrez Genomes� Human Mutations: OMIM� Biochemical Pathways: KEGG� Integrated tools: Biology Workbench,

BCM Search Launcher

Page 30: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Entrez Databases contain more than just DNA & protein sequences

Page 31: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Bioinformatics Data

� DNA Sequences� Genes

� Location, introns, exons, function, etc.� Gene products

� RNA, Proteins� Pathways

� Signaling, metabolic, genomic, etc.

Page 32: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Bioinformatics Data

� Experimental� Gene expression, knockouts, etc.

� Literature� Diseases, viruses, bacteria� Organisms� Textbooks

� Expert knowledge� Unpublished� Insights

Page 33: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

親緣關係http://www.pbs.org/wgbh/nova/israel/family.html

Page 34: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

生物資訊工具

�美國國家生技資訊中心NCBI(National Center For Biotechnology Information):http://www.ncbi.nlm.nih.gov/

�專門搜集排序完全完成的基因資料庫:http://www-nbrf.georgetown.edu/

�病毒基因組資料庫:� http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html� http://www.cbs.dtu.dk/services/EasyGene/� http://www.ebi.ac.uk/genomes/

Page 35: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

生物資訊工具

� RNA類病毒 (viroid—即無外殼的RNA) 資料庫:� http://subviral.med.uottawa.ca/cgi-bin/home.cgi?oldURL=1� http://www.ncbi.nlm.nih.gov/ICTVdb/ICTVdB/

� 質體資料庫:� http://www.genomics.ceh.ac.uk/plasmiddb/� http://plasmid.hms.harvard.edu/� http://beckmancenter.ahc.umn.edu/cgi-bin/plasmidlookup.pl� http://gind-db.ucsf.edu:8000/cgi-bin/Plasmid/main_menu2.cgi� http://bccm.belspo.be/newsletter/8-00/bccm02.htm

� 酵母菌資料庫:� http://db.yeastgenome.org/cgi-bin/seqTools� http://mips.gsf.de/genre/proj/yeast/� http://www.ncyc.co.uk/search.php� http://genome-www.stanford.edu/

Page 36: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

生物資訊工具

� 胞器資料庫:� http://organelledb.lsi.umich.edu/� http://www.ebioinfogen.com/biobase/organelle/

� 人類粒線體資料庫 (共37個基因):� http://www.mitomap.org/� http://evogen.jgi.doe.gov/cgi-

bin/Mt_gene_order_browser.cgi� http://www.genpat.uu.se/mtDB/� http://www.hmdb.uniba.it/hmdb/index.jsp� http://www.smgf.org/pages/mtdatabase.jspx

� 以不同生物分別的基因資料庫:� http://restools.sdsc.edu/biotools/biotools10.html

Page 37: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

演化、分類工具

� http://www.ucmp.berkeley.edu/index.html

� http://www.species2000.org/� http://evolution.berkeley.edu/

Page 38: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Perl

Page 39: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Perl programming for bioinformatics

� Perl language� http://www.perl.org� http://www.bioperl.org� http://www.activestate.com/products/acti

veperl/� http://downloads.activestate.com/Active

Perl/Windows/5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi (Win, x86)

Page 40: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

Perl program

� Perl學習手札� Perl 學習手札全文

Page 41: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

What is “Perl”?

� Perl is an acronym that stands forPractical Extraction and Reporting Language)

� Larry Wall began work on Perl in 1987

� Perl was originally named "Pearl", after the Parable of the Pearl from the Gospel of Matthew

� Larry Wall claims that he considered (and rejected) every three- and four-letter word in the dictionary Engraving by John Everett Millais illustrating

the parable of the Pearl of Great Price.Published in Parables of Our Lord, 1864

Page 42: 中草藥生物資訊 與程式設計 - D'TrendsIntroduction to Bioinformatics. Stephen A. Krawetzand David D. Womble2003. Humana Press 基礎生物資訊實務李炎編著2002 藝軒出版社

The end