no. : 1 6 molecular genetics module : 29 large scale

13
ZOOLOGY Molecular Genetics Large scale analysis of genome: Human Genome Part I 1 Paper No. : 16 Molecular Genetics Module : 29 Large scale analysis of genome: Human Genome Part I Development Team Paper Coordinator: Prof. Namita Agarwal Department of Zoology, University of Delhi Principal Investigator: Prof. Neeta Sehgal Head, Department of Zoology, University of Delhi Content Writer: Dr. Nidhi Garg Deshbandhu College, University of Delhi Content Reviewer: Dr. Surajit Sarkar Department of Genetics, South Campus, Delhi University Co-Principal Investigator: Prof. D.K. Singh Department of Zoology, University of Delhi

Upload: others

Post on 21-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

1

Paper No. : 16 Molecular Genetics Module : 29 Large scale analysis of genome: Human Genome Part I

Development Team

Paper Coordinator: Prof. Namita Agarwal

Department of Zoology, University of Delhi

Principal Investigator: Prof. Neeta Sehgal Head, Department of Zoology, University of Delhi

Content Writer: Dr. Nidhi Garg Deshbandhu College, University of Delhi Content Reviewer: Dr. Surajit Sarkar Department of Genetics, South Campus, Delhi University

Co-Principal Investigator: Prof. D.K. Singh

Department of Zoology, University of Delhi

Page 2: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

2

Description of Module

Subject Name Zoology

Paper Name Molecular Genetics Zool 016

Module Name/Title Large scale analysis of genome

Module Id M29: Human Genome: Part I

Keywords Genome, Gene, Sequencing, Genetic and Physical Maps

Contents

1. Learning Outcomes

2. Introduction

3. Human Genome Project (HGP)

4. History of Human Genome Sequencing

5. Budget of the Human Genome Project

6. Goals of the Human Genome Project

7. Summary

Page 3: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

3

1. Learning Outcomes

After studying this module, you shall be able to

• Know how what genome is.

• Learn about the history of the Human Genome Project.

• Evaluate the importance of Human Genome Project.

• Know the important goals of HGP and how well within the time frame they were achieved.

2. Introduction

The genome is defined as the genetic material of an organism which comprises of DNA or it can be

RNA in RNA viruses. The term genome was coined by Professor Hans Winkler of the University of

Hamburg, Germany in 1920. The DNA is organized in the form of chromosomes. In haploid

organisms such as bacteria, archaea, viruses and in organelles like mitochondria and chloroplasts, the

genome consists of a single circular or linear chromosome. In a sexually reproducing diploid

organism, the genome comprises of a two full sets of chromosomes in a somatic cell. The gametes of

a diploid organism contain half the number of chromosomes due to meiosis. Some organisms may be

triploid, tetraploid, pentaploid etc. and therefore, have multiple sets of chromosomes. The term

genome thus, refers not only to the DNA present in the nucleus known as the "nuclear genome" but

also to the DNA stored in mitochondria and chloroplast which is known as the "mitochondrial

genome" and the "chloroplast genome".

Sequencing the genome of an organism refers to the determination of the order of nitrogenous bases

A, T, G and C in its genetic material. Thus, for a virus it may involve knowing the base composition

of only a single chromosome whereas, for a bacterium it may involve sequencing both the

chromosome and the plasmids which together comprise its genome. For sexually reproducing

organisms, genome sequencing means determining the sequences of a complete set of autosomes and

one of each type of sex chromosome. For example, the human genome consists of 22 pairs of

autosomes and 2 sex chromosomes, therefore a complete genome sequence will comprise of 46

separate chromosome sequences. It is also important to determine the sequence of the mitochondrial

or chloroplast DNA to have complete information about the genome of eukaryotic organisms.

To sequence the genome of any organism genome projects are undertaken. Genome projects are

scientific research projects initiated by research groups world over with the aim of sequencing the

complete genome, annotating the protein-coding genes and decoding the essential features of a

genome which either distinguishes it or relates it to another genome. Both the length of the genome as

well as the total number of genes differ extensively from one species to another.

The decision to sequence a genome by the research agencies depends upon the importance of that

organism. It might be a model organism, may have commercial importance (example crop plant,

livestock, yeast or enzyme producing bacteria) or significant importance to human health. Emphasis is

also given to sequencing the genome of a species that will help in determining molecular evolution or

phylogeny. The genome sequence provides information regarding the order of every nitrogenous base,

whereas a genome map is less detailed than a genome sequence but identifies the landmarks and helps

in navigating around the genome. Historically, for sequencing the eukaryotic genomes the common

Page 4: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

4

approach was to first map the genome to which gives information regarding the landmarks within the

genome instead of sequencing the chromosome in one go. Mapping the chromosome allows

sequencing to be done bit by bitas one already knows just about where a particular DNA fragment

might be located on the chromosome. Currently, due to improvements in DNA sequencing technology

it is possible to sequence the entire genome more quickly and in one go using methods such as the

Shotgun approach. Sequencing of genomes has become more affordable due to steady reduction in the

cost in terms of cost per base pair.

3. Human Genome Project

The HGP was a collaborative project between several countries that aimed to know the sequence of 3

billion base pairs comprising the human DNA. It also involved both identifying and mapping the total

number of genes in the human genome. The HGP was both proposed and funded by the US

government and till date is the world's largest collaborative project. Although, the planning of the

project started in 1984 but the work began in 1990 and the complete genome was announced in 2003.

In 1998 Craig Venter founded the Celera Genomics, a company that took up the sequencing project

parallel to HGP that was privately funded. The sequencing was carried out in the twenty institutes

mentioned below.

The International Human Genome Sequencing Consortium included the following institutes:

1. The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.

2. The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton,

Cambridgeshire, U. K.

3. Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S.

4. United States DOE Joint Genome Institute, Walnut Creek, Calif., U.S.

5. Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and

Human Genetics, Houston, Tex., U.S.

6. RIKEN Genomic Sciences Center, Yokohama, Japan

7. Genoscope and CNRS UMR-8030, Evry, France

8. GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., USA

9. Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany

10. Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of

Sciences, Beijing, China

11. Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash.

12. Stanford Genome Technology Center, Stanford, Calif., U.S.

13. Stanford Human Genome Center and Department of Genetics, Stanford University School of

Medicine, Stanford, Calif., U.S.

14. University of Washington Genome Center, Seattle, Wash., U.S.

15. Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan

16. University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S.

17. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and

Biochemistry, University of Oklahoma, Norman, Okla., U.S.

18. Max Planck Institute for Molecular Genetics, Berlin, Germany

Page 5: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

5

19. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor,

N.Y., U.S.

20. GBF - German Research Centre for Biotechnology, Braunschweig, Germany.

These international institutions played a vital role in quick and effective completion of the HGP. In

the United States, where the project was founded the major contributors were

1. The U.S. Department of Energy (DOE)- It was the center for the discussion of the HGP as early

as 1984.

2. National Institutes of Health (NIH)- It first participated in the project in 1988, by creating the

Office for Human Genome Research, which was upgraded in 1990 to the National Center for

Human Genome Research and then later on in 1997 it was named as the National Human Genome

Research Institute (NHGRI).

The funding for the HGP came from not only the US government through the NIH and DOE but, also

from a UK based charity organization known as the Wellcome Trust, and several organizations

located world over. The UNESCO played a significant role in involving the developing nations in the

HGP.

4. History of Human Genome Sequencing

The HGP arose because of two important perceptions that arose in the early 1980s. The first was to

sequence complete genomes which would result in accelerated biomedical research, as it would allow

the researchers to solve problems in an all-inclusive and unbiased fashion. The second insight was the

requirement to build infrastructure through communal effort, something that no one had attempted in

biomedical research so far. Important projects that played a vital role in crystallizing these insights

were:

1. In between 1977 and 1982, the complete genomes of bacterial viruses ØX174 and ƛ, the animal

virus SV407 and the human mitochondrion were sequenced. These sequencing projects

demonstrated the practicability of assembling small sequences into complete genomes. The data

generated led people to value the complete set of genes and other functional elements for further

research and analysis.

2. Botstein and colleagues in 1980, launched a program which could generate a human genetic map

which made it feasible to find genes causing disease of unknown function on the basis of only

their inheritance patterns.

3. In the mid-1980s, Olson and Sulston launched programs that created physical maps of clones

containing sequences that covered the yeast and worm genomes. This allowed the separation of

genes and regions on the basis of their chromosomal position.

The history of the HGP dates back to 1985 when, Robert Sinsheimer in the May of 1985 organized a

workshop for discussing the sequencing of the human genome, but the NIH was not interested in his

proposal. In March 1986, Charles DeLisi and David Smith from the DOE's Office of Health and

Environmental Research (OHER) organized` Santa Fe Workshop. Two months later a workshop was

organized by Dr. James Watson at the Cold Spring Harbor Laboratory. A memo containing a broad

plan of HGP was sent by Charles DeLisi, the then Director of OHER, to Alvin Trivelpiece who was

Page 6: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

6

the Assistant Secretary for Energy Research. Dr. Alvin Trivelpiece then pursued and also got the

consent for the project from Deputy Secretary William Flynn Martin. The Santa Fe workshop had

indeed been successful in motivating the Federal Agency to support the HGP which ultimately led to

the approval of funds which allowed the OHER to start the HGP in 1986. A total of $4 million were

initially allocated to initiate the project.

The budget for the genome project was proposed by President Regan in his 1987 budget to Congress

which was ultimately approved by both the Houses. Senator Peter Domenici, a friend of DeLisi

played a vital role in getting the Congressional approval for the project by chairing both the Senate

Committee on Energy and Natural Resources and the Budget Committee. A line item budget of $3

billion was approved by the Reagan Administration and the Project was expected to take 15 years

beginning from 1990. In 1990, the DOE and NIH, signed a MoU for coordinating the plans and for

initiating the genome project. In 1990, James Watson headed the NIH funded Genome Program while

David Galas was initially made the Director of the Office of Biological and Environmental Research

in the U.S. Department of Energy’s Office of Science. In 1993, Francis Collins succeeded James

Watson while Aristides Patrinos succeeded Galas. Francis Collins was made to head the project

Director of the NIH. National Center for Human Genome Research which was later renamed as the

National Human Genome Research Institute.

In 1998, an American Craig Venter founded a privately funded firm known as Celera Genomics. In

the early 1990’s he was a research scientist at the NIH, associated from the beginning with the HGP.

The Celera was founded with a capital of $300,000,000 and aimed to sequence the genome speedily

and at a cost much lower than $3 billion. Celera Genomics employed the technique of whole genome

shotgun sequencing, which was employed for sequencing bacterial genomes with a size of six million

base pairs, but had never been used for sequencing a genome containing three billion base pairs.

Celera Genomics had promised to publish their findings by releasing new data annually abiding by the

1996 "Bermuda Statement". On the other hand, the HGP being a publicly funded project released its

new data daily. Celera Genomics permitted neither the free redistribution nor the scientific use of the

data. Thus, the HGP being a publically funded project released the first draft of the human genome

earlier than Celera Genomics. In March 2000, the President of United States, Bill Clinton denied the

patenting of the human genome sequence, and that the researchers will have free access to it. This

announcement by the President had a negative impact on the Celera's shares at the Nasdaq stock

exchange thus, its price went down drastically. The biotechnology sector as a whole suffered a loss of

approximately $50 billion in the stock market within two days of the announcement.

As a result of international cooperation, developments in genome sequencing and bioinformatics, a

'working draft' of the genome was finished in 2000, a year ahead of the planned timeline. Genome

announcement was made on June 26, 2000, together by the U.S. President Bill Clinton and the British

Prime Minister Tony Blair. A rough draft of the genome was completed and released on July 7, 2000

by the UCSC Genome Bioinformatics Group at the University of California. On the first day of free

and open access about 500 GB of information was downloaded by the scientific community from the

UCSC genome server. The research paper describing the details which included the methods and

sequence analysis of the rough draft of the human genome was published in February 2001. The

researchers of HGP published their work in the journal Nature while the scientists at Celera Genomics

Page 7: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

7

published their work in Science. These drafts published by both the groups covered about 83% of the

genome which included 90% of the euchromatic regions and 150,000 gaps. At that time, the order and

orientation of several DNA segments was not well-known. Due to advances in the sequencing

techniques, the complete genome was announced on April 14, 2003, which was two years ahead of

the timeline. The complete draft of the human genome was published in 2003. In May 2006, the

sequence of the last chromosome was published in Nature which led to the completion of the project.

Mentioned below is the timeline of the Human Genome Project (Figure 1)

For more information, watch the title on https://www.youtube.com/watch?v=slRyGLmt3qc

Figure 1: Timeline of the Human Genome Project 1984-2001

Source: Lander, et. al. 2001. Initial sequencing and analysis of the human genome. International Human

Genome Sequencing Consortium. Nature, Vol 409, pp 860- 921.

For more information regarding the key events of the Human Genome Project and the ongoing

research log on to http://www.genome.gov/10001763

5. Budget of the Human Genome Project

The budget set for carrying out the Human Genome Project was $3 billion. This amount was to be

spent in three stages over a 15-year period initially (1990-2005) but due to accelerated progress the

funding was calculated from 1990 to 2003 (Table 1). The funding was to be spent for

Page 8: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

8

1. Conducting studies of human diseases.

2. Sequencing of model organisms.

3. For developing latest technologies to be used for biological and medical research.

4. Development of computational methods to analyze genomes.

5. Ethical, legal, and social issues (ELSI) related to genome sequencing.

6. Sequencing the human genome.

Table 1: The Funding of U.S. Human Genome Project from 1988 to 2003 ($Millions).

U.S. Human Genome Project Funding

($Millions)

FY DOE NIH* U.S. Total

1988 10.7 17.2 27.9

1989 18.5 28.2 46.7

1990 27.2 59.5 86.7

1991 47.4 87.4 134.8

1992 59.4 104.8 164.2

1993 63.0 106.1 169.1

1994 63.3 127.0 190.3

1995 68.7 153.8 222.5

1996 73.9 169.3 243.2

1997 77.9 188.9 266.8

1998 85.5 218.3 303.8

1999 89.9 225.7 315.6

2000 88.9 271.7 360.6

2001 86.4 308.4 394.8

2002 90.1 346.7 434.3

2003 64.2 372.8 437

Note: Funds involved in construction have not been included, as they comprise a minor port ion of the

budget.

Source: http://web.ornl.gov/sci/techresources/Human_Genome/project/budget.shtml

The funding agencies allotted 3% to 5% of their budgets for studying ethical, legal, and social issues

related to the project.

6. Goals of the Human Genome Project

The goals for the 3 five year plans were set together by the NIH and the DOE, as they were the two

main organizations which received funding for the human genome project (Table 2). The HGP was a

collaborative worldwide research effort whose primary goal was to analyze the structure of human

DNA and to know the precise position of genes. Parallely, they also planned to sequence the genome

of certain model organisms for obtaining comparative information which was important to understand

the functioning of the human genome. The information generated by the HGP will aid in the

advancement of biomedical science. Not only this, the knowledge of genes will provide enormous

Page 9: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

9

utility in medicine, helping in understanding and treating several genetic diseases and multi-factorial

diseases where genetic predisposition plays an important role.

The human genome project was initially planned for a span of 15 years from 1990 to 2005. This time

period was divided into three five year plans. The first 5-year plan from 1990-1995, was revised in

1993 as there was accelerated progress in the genome sequencing. The second 5-year plan defined

goals from 1993 to 1998. The development of the third plan occurred through several workshops

conducted by the DOE and NIH.

First five-year (1990-1995) Goals of the Human Genome Project:

1. Mapping and Sequencing the Human Genome:

a) Genetic Mapping: To complete the human genetic map containing markers spaced 2 to 5

centi Morgan (cM). To recognize every marker by a sequence tagged site (STS).

b) Physical Mapping: To assemble STS maps of all human chromosomes with markers spaced

at 100,000-bp intervals. To generate overlapping sets of cloned DNA with continuity over

lengths of 2 Mb for large parts of the human genome.

c) DNA Sequencing: To improve the existing DNA sequencing methods and to develop newer

sequencing techniques, this will help in lowering the cost of large-scale sequencing of DNA

to $0.50 per base pair. To sequence 10 Mb of human DNA in large uninterrupted stretches.

2. Gene Identification: To develop methods efficient enough for not only identifying but also

placing the known genes on physical maps.

3. Mapping and Sequencing the Genomes of Model Organisms: To generate a genetic map of

mouse genome on the basis of DNA markers. To start the physical mapping on just one or two

chromosomes. Sequencing approximately 20 Mb of DNA of different model organisms, with a

focus on stretches that are 1 Mb long. This would be done during the development and validation

of new and developed DNA sequencing technology.

4. Data Collection and Distribution: To develop software and database effective enough for supporting

the large-scale mapping and sequencing projects. To create database tools capable of providing an easy

access to up-to-date physical, genetic and chromosome mapping. Not only this, the database must also

allow access to sequencing information data which can be easily compared with the data of several other

data sets. To develop algorithms and analytical tools for interpreting genomic data.

5. Ethical, Legal, and Social Considerations: To improve programs that aim to understand the

ethical, legal, and social implications of HGP data. It also involved the identification and the

defining of the major issues related to HGP data and the development of initial policy options for

addressing them.

6. Research Training: The HGP also aimed to support the research training of both the pre- and the

postdoctoral fellows from the fiscal year 1990. The project would support the training tilla total of

600 trainees per year is reached by 1995. To scrutinize the requirement for other types of research

training in 1991.

7. Technology Development: To back automated instrumentation and innovative and high-risk

development of technology. To improvise the existing technology for meeting the requirements of

the HGP.

8. Technology Transfer: To improve the working relationships with industry. To boost as well as

assist the transfer of technologies and medically important information to the medical fraternity.

Page 10: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

10

Even though the first 5-year plan was till September 1995 but due to unexpected advances in genome

research the first 5-year goals were updated in 1993. Detailed human genetic maps were generated

along with better physical maps of both human and model organisms. There was improvement in

DNA sequencing and bioinformatics. Alongside there was identification of major ethical, legal, and

social issues (ELSI) associated with increased availability of genetic information. The genome project

had begun to demonstrate its deep impact on biomedical research. The availability of comprehensive

genetic maps allowed the scientists to find genes associated with Menkes syndrome, Huntington's

disease, myotonic dystrophy, fragile X syndrome, etc.

The second 5-year plan was from 1993 to 1998 and was published in the journal Science, coauthored

by Francis Collins and David Galas. The new 5-year plan extended the research goals of the first 5-

year plan and added specific new goals in order to develop technology for identifying genes and

mapping. The main goal was to get the complete human DNA sequenced. Development of programs

for the distribution of genome materials to the scientific community was also envisioned. There was

an ongoing debate regarding the value of sequencing the whole genome, researchers realized that

smaller-scale techniques were ineffective in providing complete information regarding the genes and

their biological functions.

Second five-year (1993-1998) Goals of the Human Genome Project:

1. Genetic Mapping: To generate a full 2- to 5-cM map by 1995. Developing techniques for fast

genotyping. To find easy to use markers along with new techniques for mapping.

2. Physical Mapping: To complete a STS map of the human genome having a resolution of 100 kb.

3. DNA Sequencing: To develop DNA sequencing methods and capacity capable of sequencing

DNA in Mb and at a rate of 50 Mb per year. To develop high-throughput sequencing technology,

this focuses on systems integration of all steps beginning from preparation of template till data

interpretation.

4. Gene Identification: To develop efficient techniques to identify genes and to place known genes

on physical maps or sequenced DNA.

5. Technology Development: To significantly increase the support for developing innovative

technology and improving the present technology used for DNA sequencing.

6. Model Organisms: To complete an STS map of the mouse genome at a resolution of 300kb. To

sequence selected segments of mouse DNA alongside the corresponding human DNA. To

complete the sequencing of E. coli and S. cerevisiae by 1998 or earlier. To sequence the genome

of Caenorhabditis elegans and Drosophila melanogaster for their near completion by 1998.

7. Informatics: To continue the creation, development, and operation of databases and tools for

easy access to data. This should include effective tools and standards to facilitate data exchange

and links among databases. To consolidate, distribute, and continue the development of effective

software’s for large-scale genome projects. To carry on the development of software’s required

for comparison and understanding genome data.

8. Ethical, Legal, and Social Implications (ELSI): To continue the identification of issues and the

development of policy options for addressing them. To develop and distribute policy options

concerning the genetic testing services with probable extensive usage. To raise better approval of

genetic variation in humans. To increase and enlarge public and professional education which

would make people sensitive to socio-cultural and psychological matters.

Page 11: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

11

9. Training: To carry on the training of scientists in interdisciplinary sciences which are related to

genome research.

10. Technology Transfer: To inspire and escalate transmission of technology both inside and outside

institute of genome research.

11. Outreach: To facilitate the co-operation with those ready to create centers for dissemination of

genome data. Sharing of information and materials within 6 months of their development, through

submission of the data to public databases or repositories, or to both.

Third five year (1998-2003) Goals of the Human Genome Project:

1. Human DNA Sequence: To complete the sequencing of human genome by 2003. To complete

sequencing one-third of the human DNA and to achieve a minimum of 90% coverage of the

genome in a working draft based on mapped clones by the end of 2001. Making the complete

sequence available free of cost.

2. Sequencing Technology: Emphasis on continued growth in the throughput and a subsequent

decrease in sequencing cost. To support research which leads to the development of novel

technologies that can significantly improve sequencing technology.

3. Human Genome Sequence Variation: To promote the development of technologies for rapid,

and large-scale identification of SNPs and other DNA sequence variants. To generate a SNP map

of containing a minimum of 100,000 markers. Creation of public resources of DNA samples and

cell lines.

4. Functional Genomics Technology: Generation of complete cDNA clones and sequences

representing human genes and model organisms. Supporting research to develop techniques for

studying the functions of non-protein-coding sequences and complete study of gene expression.

5. Improve methods for genome-wide mutagenesis: Development of technology for conducting

comprehensive protein analyses.

6. Comparative Genomics: Completion of genome sequencing of C. elegans by 1998, and

Drosophila by 2002. Development of physical and genetic map for Mus musculus, and

completion of its genome sequence by 2008. Identification of additional valuable model

organisms and to study their genomic.

7. Ethical, Legal, and Social Issues: To look at various concerns that are associated with the

generation of the human DNA sequence and genetic variation. Examination of issues that have

arisen due to incorporation of genetic technologies and information into health care and public

health activities. Conduct research on the effect of racial, ethnic, and socioeconomic factors on the

usage, understanding, and explanation of genetic information, genetic services and policy

development.

8. Bioinformatics and Computational Biology: To further develop both the content and the

usefulness of the existing databases. To promote the development of improved methods for

generation of data, capture, annotation, comprehensive functional studies, representation and

analysis of sequence similarity and variation. Development of software that is robust, exportable

and extensively shared.

9. Training and Manpower: To encourage the training of researchers that are skilled in the field of

genomics and to establish their academic career. To promote a rise in the number of scholars

having knowledge of both genetic and genomic and in ELSI.

Page 12: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

12

The goals of the HGP and their respective completion dates are mentioned in the table below (Table 2).

Table 2: The goals of the Human Genome Project and their date of completion.

Area Goal Achieved Date

Genetic Map 2- to 5-cM resolution map (600

– 1,500 markers)

1-cM resolution map (3,000

markers)

September 1994

Physical Map 30,000 STSs 52,000 STSs October 1998

DNA Sequence 95% of gene-containing part of

human sequence finished to

99.99% accuracy

98% of gene-containing part of

human sequence finished to

99.99% accuracy

April 2003

Capacity and

Cost of Finished

Sequence

Sequence 500 Mb/year at <

$0.25 per finished base

Sequence>1,400 Mb/year at

<$0.09 per finished base

November 2002

Human Sequence

Variation

100,000 mapped human SNPs 3.7 million mapped human

SNPs

February 2003

Gene

Identification

Full-length human cDNAs 15,000 full-length human

cDNAs

March 2003

Model Organisms Complete genome sequences of

E. coli, S. cerevisiae, C. elegans,

D. melanogaster

Finished genome sequences of

E. coli, S. cerevisiae,

C. elegans, D. melanogaster,

plus whole-genome drafts of

several others, including C.

briggsae, D. pseudoobscura,

mouse and rat.

April 2003

Functional

Analysis

Develop genomic-scale

technologies

High-throughput

oligonucleotide synthesis

1994

DNA microarrays 1996

Eukaryotic, whole-genome

knockouts (yeast)

1999

Scale-up of two-hybrid system

for protein-protein interaction

2002

Source: Collins, F. S., Morgan, M., and Patrinos, A. 2003. The Human Genome Project: Lessons from Large-

Scale Biology. Science, Vol. 300 no. 5617 pp. 286-290.

7. Summary

The term genome was coined by Professor Hans Winkler and is defined as the genetic material of

an organism.

Sequencing of an organism’s genome is the determination of the order of nitrogenous bases A, T,

G and C in its genetic material.

Page 13: No. : 1 6 Molecular Genetics Module : 29 Large scale

ZOOLOGY Molecular Genetics

Large scale analysis of genome: Human Genome Part I

13

Genome projects are scientific research project taken up by research groups to know the complete

genome sequence of organisms and to annotate protein-coding genes and gain information about

the other important features of a genome.

The Human Genome Project (HGP) was a collaborative effort which was started in 1990. Its goal

was to sequence and identify all the three billion base pairs in the human genome.

The Human Genome Project aimed to completely map as well as understand the structure and

function all the genes of humans. This was followed by the identification of genetic variants

which escalate the possibility for common diseases such as cancer and diabetes and to develop the

appropriate treatment for it.

The human genome was sequenced in twenty universities and research centers that were located

in the US, the UK, France, Japan, Germany, and China.

The HGP was started in the United States and funded majorly by the U.S. Department of Energy

(DOE) and the National Institutes of Health (NIH).

A budget of$3 billion was set for carrying out the HGP which was planned to be spent in 3 stages

in 15-year period from 1990-2005, but due to accelerated progress the project was completed in

2003.

The goals of the HGP were to generate the genetic Map with a resolution of 2- to 5-cM (600 –

1,500 markers), to generate a Physical Map with 30,000 STSs, to sequence the euchromatin with

99% accuracy, to increase the capacity of sequencing with a subsequent reduction in sequencing

cost, to map the single nucleotide polymorphism, identification of full length cDNA, to sequence

the genomes of model organisms such as E. coli, S. cerevisiae, C. elegans, D. melanogaster which

could be subsequently used in comparative analysis.