no. : 1 6 molecular genetics module : 29 large scale
TRANSCRIPT
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
1
Paper No. : 16 Molecular Genetics Module : 29 Large scale analysis of genome: Human Genome Part I
Development Team
Paper Coordinator: Prof. Namita Agarwal
Department of Zoology, University of Delhi
Principal Investigator: Prof. Neeta Sehgal Head, Department of Zoology, University of Delhi
Content Writer: Dr. Nidhi Garg Deshbandhu College, University of Delhi Content Reviewer: Dr. Surajit Sarkar Department of Genetics, South Campus, Delhi University
Co-Principal Investigator: Prof. D.K. Singh
Department of Zoology, University of Delhi
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
2
Description of Module
Subject Name Zoology
Paper Name Molecular Genetics Zool 016
Module Name/Title Large scale analysis of genome
Module Id M29: Human Genome: Part I
Keywords Genome, Gene, Sequencing, Genetic and Physical Maps
Contents
1. Learning Outcomes
2. Introduction
3. Human Genome Project (HGP)
4. History of Human Genome Sequencing
5. Budget of the Human Genome Project
6. Goals of the Human Genome Project
7. Summary
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
3
1. Learning Outcomes
After studying this module, you shall be able to
• Know how what genome is.
• Learn about the history of the Human Genome Project.
• Evaluate the importance of Human Genome Project.
• Know the important goals of HGP and how well within the time frame they were achieved.
2. Introduction
The genome is defined as the genetic material of an organism which comprises of DNA or it can be
RNA in RNA viruses. The term genome was coined by Professor Hans Winkler of the University of
Hamburg, Germany in 1920. The DNA is organized in the form of chromosomes. In haploid
organisms such as bacteria, archaea, viruses and in organelles like mitochondria and chloroplasts, the
genome consists of a single circular or linear chromosome. In a sexually reproducing diploid
organism, the genome comprises of a two full sets of chromosomes in a somatic cell. The gametes of
a diploid organism contain half the number of chromosomes due to meiosis. Some organisms may be
triploid, tetraploid, pentaploid etc. and therefore, have multiple sets of chromosomes. The term
genome thus, refers not only to the DNA present in the nucleus known as the "nuclear genome" but
also to the DNA stored in mitochondria and chloroplast which is known as the "mitochondrial
genome" and the "chloroplast genome".
Sequencing the genome of an organism refers to the determination of the order of nitrogenous bases
A, T, G and C in its genetic material. Thus, for a virus it may involve knowing the base composition
of only a single chromosome whereas, for a bacterium it may involve sequencing both the
chromosome and the plasmids which together comprise its genome. For sexually reproducing
organisms, genome sequencing means determining the sequences of a complete set of autosomes and
one of each type of sex chromosome. For example, the human genome consists of 22 pairs of
autosomes and 2 sex chromosomes, therefore a complete genome sequence will comprise of 46
separate chromosome sequences. It is also important to determine the sequence of the mitochondrial
or chloroplast DNA to have complete information about the genome of eukaryotic organisms.
To sequence the genome of any organism genome projects are undertaken. Genome projects are
scientific research projects initiated by research groups world over with the aim of sequencing the
complete genome, annotating the protein-coding genes and decoding the essential features of a
genome which either distinguishes it or relates it to another genome. Both the length of the genome as
well as the total number of genes differ extensively from one species to another.
The decision to sequence a genome by the research agencies depends upon the importance of that
organism. It might be a model organism, may have commercial importance (example crop plant,
livestock, yeast or enzyme producing bacteria) or significant importance to human health. Emphasis is
also given to sequencing the genome of a species that will help in determining molecular evolution or
phylogeny. The genome sequence provides information regarding the order of every nitrogenous base,
whereas a genome map is less detailed than a genome sequence but identifies the landmarks and helps
in navigating around the genome. Historically, for sequencing the eukaryotic genomes the common
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
4
approach was to first map the genome to which gives information regarding the landmarks within the
genome instead of sequencing the chromosome in one go. Mapping the chromosome allows
sequencing to be done bit by bitas one already knows just about where a particular DNA fragment
might be located on the chromosome. Currently, due to improvements in DNA sequencing technology
it is possible to sequence the entire genome more quickly and in one go using methods such as the
Shotgun approach. Sequencing of genomes has become more affordable due to steady reduction in the
cost in terms of cost per base pair.
3. Human Genome Project
The HGP was a collaborative project between several countries that aimed to know the sequence of 3
billion base pairs comprising the human DNA. It also involved both identifying and mapping the total
number of genes in the human genome. The HGP was both proposed and funded by the US
government and till date is the world's largest collaborative project. Although, the planning of the
project started in 1984 but the work began in 1990 and the complete genome was announced in 2003.
In 1998 Craig Venter founded the Celera Genomics, a company that took up the sequencing project
parallel to HGP that was privately funded. The sequencing was carried out in the twenty institutes
mentioned below.
The International Human Genome Sequencing Consortium included the following institutes:
1. The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.
2. The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton,
Cambridgeshire, U. K.
3. Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S.
4. United States DOE Joint Genome Institute, Walnut Creek, Calif., U.S.
5. Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and
Human Genetics, Houston, Tex., U.S.
6. RIKEN Genomic Sciences Center, Yokohama, Japan
7. Genoscope and CNRS UMR-8030, Evry, France
8. GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., USA
9. Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany
10. Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of
Sciences, Beijing, China
11. Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash.
12. Stanford Genome Technology Center, Stanford, Calif., U.S.
13. Stanford Human Genome Center and Department of Genetics, Stanford University School of
Medicine, Stanford, Calif., U.S.
14. University of Washington Genome Center, Seattle, Wash., U.S.
15. Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan
16. University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S.
17. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and
Biochemistry, University of Oklahoma, Norman, Okla., U.S.
18. Max Planck Institute for Molecular Genetics, Berlin, Germany
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
5
19. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor,
N.Y., U.S.
20. GBF - German Research Centre for Biotechnology, Braunschweig, Germany.
These international institutions played a vital role in quick and effective completion of the HGP. In
the United States, where the project was founded the major contributors were
1. The U.S. Department of Energy (DOE)- It was the center for the discussion of the HGP as early
as 1984.
2. National Institutes of Health (NIH)- It first participated in the project in 1988, by creating the
Office for Human Genome Research, which was upgraded in 1990 to the National Center for
Human Genome Research and then later on in 1997 it was named as the National Human Genome
Research Institute (NHGRI).
The funding for the HGP came from not only the US government through the NIH and DOE but, also
from a UK based charity organization known as the Wellcome Trust, and several organizations
located world over. The UNESCO played a significant role in involving the developing nations in the
HGP.
4. History of Human Genome Sequencing
The HGP arose because of two important perceptions that arose in the early 1980s. The first was to
sequence complete genomes which would result in accelerated biomedical research, as it would allow
the researchers to solve problems in an all-inclusive and unbiased fashion. The second insight was the
requirement to build infrastructure through communal effort, something that no one had attempted in
biomedical research so far. Important projects that played a vital role in crystallizing these insights
were:
1. In between 1977 and 1982, the complete genomes of bacterial viruses ØX174 and ƛ, the animal
virus SV407 and the human mitochondrion were sequenced. These sequencing projects
demonstrated the practicability of assembling small sequences into complete genomes. The data
generated led people to value the complete set of genes and other functional elements for further
research and analysis.
2. Botstein and colleagues in 1980, launched a program which could generate a human genetic map
which made it feasible to find genes causing disease of unknown function on the basis of only
their inheritance patterns.
3. In the mid-1980s, Olson and Sulston launched programs that created physical maps of clones
containing sequences that covered the yeast and worm genomes. This allowed the separation of
genes and regions on the basis of their chromosomal position.
The history of the HGP dates back to 1985 when, Robert Sinsheimer in the May of 1985 organized a
workshop for discussing the sequencing of the human genome, but the NIH was not interested in his
proposal. In March 1986, Charles DeLisi and David Smith from the DOE's Office of Health and
Environmental Research (OHER) organized` Santa Fe Workshop. Two months later a workshop was
organized by Dr. James Watson at the Cold Spring Harbor Laboratory. A memo containing a broad
plan of HGP was sent by Charles DeLisi, the then Director of OHER, to Alvin Trivelpiece who was
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
6
the Assistant Secretary for Energy Research. Dr. Alvin Trivelpiece then pursued and also got the
consent for the project from Deputy Secretary William Flynn Martin. The Santa Fe workshop had
indeed been successful in motivating the Federal Agency to support the HGP which ultimately led to
the approval of funds which allowed the OHER to start the HGP in 1986. A total of $4 million were
initially allocated to initiate the project.
The budget for the genome project was proposed by President Regan in his 1987 budget to Congress
which was ultimately approved by both the Houses. Senator Peter Domenici, a friend of DeLisi
played a vital role in getting the Congressional approval for the project by chairing both the Senate
Committee on Energy and Natural Resources and the Budget Committee. A line item budget of $3
billion was approved by the Reagan Administration and the Project was expected to take 15 years
beginning from 1990. In 1990, the DOE and NIH, signed a MoU for coordinating the plans and for
initiating the genome project. In 1990, James Watson headed the NIH funded Genome Program while
David Galas was initially made the Director of the Office of Biological and Environmental Research
in the U.S. Department of Energy’s Office of Science. In 1993, Francis Collins succeeded James
Watson while Aristides Patrinos succeeded Galas. Francis Collins was made to head the project
Director of the NIH. National Center for Human Genome Research which was later renamed as the
National Human Genome Research Institute.
In 1998, an American Craig Venter founded a privately funded firm known as Celera Genomics. In
the early 1990’s he was a research scientist at the NIH, associated from the beginning with the HGP.
The Celera was founded with a capital of $300,000,000 and aimed to sequence the genome speedily
and at a cost much lower than $3 billion. Celera Genomics employed the technique of whole genome
shotgun sequencing, which was employed for sequencing bacterial genomes with a size of six million
base pairs, but had never been used for sequencing a genome containing three billion base pairs.
Celera Genomics had promised to publish their findings by releasing new data annually abiding by the
1996 "Bermuda Statement". On the other hand, the HGP being a publicly funded project released its
new data daily. Celera Genomics permitted neither the free redistribution nor the scientific use of the
data. Thus, the HGP being a publically funded project released the first draft of the human genome
earlier than Celera Genomics. In March 2000, the President of United States, Bill Clinton denied the
patenting of the human genome sequence, and that the researchers will have free access to it. This
announcement by the President had a negative impact on the Celera's shares at the Nasdaq stock
exchange thus, its price went down drastically. The biotechnology sector as a whole suffered a loss of
approximately $50 billion in the stock market within two days of the announcement.
As a result of international cooperation, developments in genome sequencing and bioinformatics, a
'working draft' of the genome was finished in 2000, a year ahead of the planned timeline. Genome
announcement was made on June 26, 2000, together by the U.S. President Bill Clinton and the British
Prime Minister Tony Blair. A rough draft of the genome was completed and released on July 7, 2000
by the UCSC Genome Bioinformatics Group at the University of California. On the first day of free
and open access about 500 GB of information was downloaded by the scientific community from the
UCSC genome server. The research paper describing the details which included the methods and
sequence analysis of the rough draft of the human genome was published in February 2001. The
researchers of HGP published their work in the journal Nature while the scientists at Celera Genomics
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
7
published their work in Science. These drafts published by both the groups covered about 83% of the
genome which included 90% of the euchromatic regions and 150,000 gaps. At that time, the order and
orientation of several DNA segments was not well-known. Due to advances in the sequencing
techniques, the complete genome was announced on April 14, 2003, which was two years ahead of
the timeline. The complete draft of the human genome was published in 2003. In May 2006, the
sequence of the last chromosome was published in Nature which led to the completion of the project.
Mentioned below is the timeline of the Human Genome Project (Figure 1)
For more information, watch the title on https://www.youtube.com/watch?v=slRyGLmt3qc
Figure 1: Timeline of the Human Genome Project 1984-2001
Source: Lander, et. al. 2001. Initial sequencing and analysis of the human genome. International Human
Genome Sequencing Consortium. Nature, Vol 409, pp 860- 921.
For more information regarding the key events of the Human Genome Project and the ongoing
research log on to http://www.genome.gov/10001763
5. Budget of the Human Genome Project
The budget set for carrying out the Human Genome Project was $3 billion. This amount was to be
spent in three stages over a 15-year period initially (1990-2005) but due to accelerated progress the
funding was calculated from 1990 to 2003 (Table 1). The funding was to be spent for
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
8
1. Conducting studies of human diseases.
2. Sequencing of model organisms.
3. For developing latest technologies to be used for biological and medical research.
4. Development of computational methods to analyze genomes.
5. Ethical, legal, and social issues (ELSI) related to genome sequencing.
6. Sequencing the human genome.
Table 1: The Funding of U.S. Human Genome Project from 1988 to 2003 ($Millions).
U.S. Human Genome Project Funding
($Millions)
FY DOE NIH* U.S. Total
1988 10.7 17.2 27.9
1989 18.5 28.2 46.7
1990 27.2 59.5 86.7
1991 47.4 87.4 134.8
1992 59.4 104.8 164.2
1993 63.0 106.1 169.1
1994 63.3 127.0 190.3
1995 68.7 153.8 222.5
1996 73.9 169.3 243.2
1997 77.9 188.9 266.8
1998 85.5 218.3 303.8
1999 89.9 225.7 315.6
2000 88.9 271.7 360.6
2001 86.4 308.4 394.8
2002 90.1 346.7 434.3
2003 64.2 372.8 437
Note: Funds involved in construction have not been included, as they comprise a minor port ion of the
budget.
Source: http://web.ornl.gov/sci/techresources/Human_Genome/project/budget.shtml
The funding agencies allotted 3% to 5% of their budgets for studying ethical, legal, and social issues
related to the project.
6. Goals of the Human Genome Project
The goals for the 3 five year plans were set together by the NIH and the DOE, as they were the two
main organizations which received funding for the human genome project (Table 2). The HGP was a
collaborative worldwide research effort whose primary goal was to analyze the structure of human
DNA and to know the precise position of genes. Parallely, they also planned to sequence the genome
of certain model organisms for obtaining comparative information which was important to understand
the functioning of the human genome. The information generated by the HGP will aid in the
advancement of biomedical science. Not only this, the knowledge of genes will provide enormous
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
9
utility in medicine, helping in understanding and treating several genetic diseases and multi-factorial
diseases where genetic predisposition plays an important role.
The human genome project was initially planned for a span of 15 years from 1990 to 2005. This time
period was divided into three five year plans. The first 5-year plan from 1990-1995, was revised in
1993 as there was accelerated progress in the genome sequencing. The second 5-year plan defined
goals from 1993 to 1998. The development of the third plan occurred through several workshops
conducted by the DOE and NIH.
First five-year (1990-1995) Goals of the Human Genome Project:
1. Mapping and Sequencing the Human Genome:
a) Genetic Mapping: To complete the human genetic map containing markers spaced 2 to 5
centi Morgan (cM). To recognize every marker by a sequence tagged site (STS).
b) Physical Mapping: To assemble STS maps of all human chromosomes with markers spaced
at 100,000-bp intervals. To generate overlapping sets of cloned DNA with continuity over
lengths of 2 Mb for large parts of the human genome.
c) DNA Sequencing: To improve the existing DNA sequencing methods and to develop newer
sequencing techniques, this will help in lowering the cost of large-scale sequencing of DNA
to $0.50 per base pair. To sequence 10 Mb of human DNA in large uninterrupted stretches.
2. Gene Identification: To develop methods efficient enough for not only identifying but also
placing the known genes on physical maps.
3. Mapping and Sequencing the Genomes of Model Organisms: To generate a genetic map of
mouse genome on the basis of DNA markers. To start the physical mapping on just one or two
chromosomes. Sequencing approximately 20 Mb of DNA of different model organisms, with a
focus on stretches that are 1 Mb long. This would be done during the development and validation
of new and developed DNA sequencing technology.
4. Data Collection and Distribution: To develop software and database effective enough for supporting
the large-scale mapping and sequencing projects. To create database tools capable of providing an easy
access to up-to-date physical, genetic and chromosome mapping. Not only this, the database must also
allow access to sequencing information data which can be easily compared with the data of several other
data sets. To develop algorithms and analytical tools for interpreting genomic data.
5. Ethical, Legal, and Social Considerations: To improve programs that aim to understand the
ethical, legal, and social implications of HGP data. It also involved the identification and the
defining of the major issues related to HGP data and the development of initial policy options for
addressing them.
6. Research Training: The HGP also aimed to support the research training of both the pre- and the
postdoctoral fellows from the fiscal year 1990. The project would support the training tilla total of
600 trainees per year is reached by 1995. To scrutinize the requirement for other types of research
training in 1991.
7. Technology Development: To back automated instrumentation and innovative and high-risk
development of technology. To improvise the existing technology for meeting the requirements of
the HGP.
8. Technology Transfer: To improve the working relationships with industry. To boost as well as
assist the transfer of technologies and medically important information to the medical fraternity.
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
10
Even though the first 5-year plan was till September 1995 but due to unexpected advances in genome
research the first 5-year goals were updated in 1993. Detailed human genetic maps were generated
along with better physical maps of both human and model organisms. There was improvement in
DNA sequencing and bioinformatics. Alongside there was identification of major ethical, legal, and
social issues (ELSI) associated with increased availability of genetic information. The genome project
had begun to demonstrate its deep impact on biomedical research. The availability of comprehensive
genetic maps allowed the scientists to find genes associated with Menkes syndrome, Huntington's
disease, myotonic dystrophy, fragile X syndrome, etc.
The second 5-year plan was from 1993 to 1998 and was published in the journal Science, coauthored
by Francis Collins and David Galas. The new 5-year plan extended the research goals of the first 5-
year plan and added specific new goals in order to develop technology for identifying genes and
mapping. The main goal was to get the complete human DNA sequenced. Development of programs
for the distribution of genome materials to the scientific community was also envisioned. There was
an ongoing debate regarding the value of sequencing the whole genome, researchers realized that
smaller-scale techniques were ineffective in providing complete information regarding the genes and
their biological functions.
Second five-year (1993-1998) Goals of the Human Genome Project:
1. Genetic Mapping: To generate a full 2- to 5-cM map by 1995. Developing techniques for fast
genotyping. To find easy to use markers along with new techniques for mapping.
2. Physical Mapping: To complete a STS map of the human genome having a resolution of 100 kb.
3. DNA Sequencing: To develop DNA sequencing methods and capacity capable of sequencing
DNA in Mb and at a rate of 50 Mb per year. To develop high-throughput sequencing technology,
this focuses on systems integration of all steps beginning from preparation of template till data
interpretation.
4. Gene Identification: To develop efficient techniques to identify genes and to place known genes
on physical maps or sequenced DNA.
5. Technology Development: To significantly increase the support for developing innovative
technology and improving the present technology used for DNA sequencing.
6. Model Organisms: To complete an STS map of the mouse genome at a resolution of 300kb. To
sequence selected segments of mouse DNA alongside the corresponding human DNA. To
complete the sequencing of E. coli and S. cerevisiae by 1998 or earlier. To sequence the genome
of Caenorhabditis elegans and Drosophila melanogaster for their near completion by 1998.
7. Informatics: To continue the creation, development, and operation of databases and tools for
easy access to data. This should include effective tools and standards to facilitate data exchange
and links among databases. To consolidate, distribute, and continue the development of effective
software’s for large-scale genome projects. To carry on the development of software’s required
for comparison and understanding genome data.
8. Ethical, Legal, and Social Implications (ELSI): To continue the identification of issues and the
development of policy options for addressing them. To develop and distribute policy options
concerning the genetic testing services with probable extensive usage. To raise better approval of
genetic variation in humans. To increase and enlarge public and professional education which
would make people sensitive to socio-cultural and psychological matters.
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
11
9. Training: To carry on the training of scientists in interdisciplinary sciences which are related to
genome research.
10. Technology Transfer: To inspire and escalate transmission of technology both inside and outside
institute of genome research.
11. Outreach: To facilitate the co-operation with those ready to create centers for dissemination of
genome data. Sharing of information and materials within 6 months of their development, through
submission of the data to public databases or repositories, or to both.
Third five year (1998-2003) Goals of the Human Genome Project:
1. Human DNA Sequence: To complete the sequencing of human genome by 2003. To complete
sequencing one-third of the human DNA and to achieve a minimum of 90% coverage of the
genome in a working draft based on mapped clones by the end of 2001. Making the complete
sequence available free of cost.
2. Sequencing Technology: Emphasis on continued growth in the throughput and a subsequent
decrease in sequencing cost. To support research which leads to the development of novel
technologies that can significantly improve sequencing technology.
3. Human Genome Sequence Variation: To promote the development of technologies for rapid,
and large-scale identification of SNPs and other DNA sequence variants. To generate a SNP map
of containing a minimum of 100,000 markers. Creation of public resources of DNA samples and
cell lines.
4. Functional Genomics Technology: Generation of complete cDNA clones and sequences
representing human genes and model organisms. Supporting research to develop techniques for
studying the functions of non-protein-coding sequences and complete study of gene expression.
5. Improve methods for genome-wide mutagenesis: Development of technology for conducting
comprehensive protein analyses.
6. Comparative Genomics: Completion of genome sequencing of C. elegans by 1998, and
Drosophila by 2002. Development of physical and genetic map for Mus musculus, and
completion of its genome sequence by 2008. Identification of additional valuable model
organisms and to study their genomic.
7. Ethical, Legal, and Social Issues: To look at various concerns that are associated with the
generation of the human DNA sequence and genetic variation. Examination of issues that have
arisen due to incorporation of genetic technologies and information into health care and public
health activities. Conduct research on the effect of racial, ethnic, and socioeconomic factors on the
usage, understanding, and explanation of genetic information, genetic services and policy
development.
8. Bioinformatics and Computational Biology: To further develop both the content and the
usefulness of the existing databases. To promote the development of improved methods for
generation of data, capture, annotation, comprehensive functional studies, representation and
analysis of sequence similarity and variation. Development of software that is robust, exportable
and extensively shared.
9. Training and Manpower: To encourage the training of researchers that are skilled in the field of
genomics and to establish their academic career. To promote a rise in the number of scholars
having knowledge of both genetic and genomic and in ELSI.
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
12
The goals of the HGP and their respective completion dates are mentioned in the table below (Table 2).
Table 2: The goals of the Human Genome Project and their date of completion.
Area Goal Achieved Date
Genetic Map 2- to 5-cM resolution map (600
– 1,500 markers)
1-cM resolution map (3,000
markers)
September 1994
Physical Map 30,000 STSs 52,000 STSs October 1998
DNA Sequence 95% of gene-containing part of
human sequence finished to
99.99% accuracy
98% of gene-containing part of
human sequence finished to
99.99% accuracy
April 2003
Capacity and
Cost of Finished
Sequence
Sequence 500 Mb/year at <
$0.25 per finished base
Sequence>1,400 Mb/year at
<$0.09 per finished base
November 2002
Human Sequence
Variation
100,000 mapped human SNPs 3.7 million mapped human
SNPs
February 2003
Gene
Identification
Full-length human cDNAs 15,000 full-length human
cDNAs
March 2003
Model Organisms Complete genome sequences of
E. coli, S. cerevisiae, C. elegans,
D. melanogaster
Finished genome sequences of
E. coli, S. cerevisiae,
C. elegans, D. melanogaster,
plus whole-genome drafts of
several others, including C.
briggsae, D. pseudoobscura,
mouse and rat.
April 2003
Functional
Analysis
Develop genomic-scale
technologies
High-throughput
oligonucleotide synthesis
1994
DNA microarrays 1996
Eukaryotic, whole-genome
knockouts (yeast)
1999
Scale-up of two-hybrid system
for protein-protein interaction
2002
Source: Collins, F. S., Morgan, M., and Patrinos, A. 2003. The Human Genome Project: Lessons from Large-
Scale Biology. Science, Vol. 300 no. 5617 pp. 286-290.
7. Summary
The term genome was coined by Professor Hans Winkler and is defined as the genetic material of
an organism.
Sequencing of an organism’s genome is the determination of the order of nitrogenous bases A, T,
G and C in its genetic material.
ZOOLOGY Molecular Genetics
Large scale analysis of genome: Human Genome Part I
13
Genome projects are scientific research project taken up by research groups to know the complete
genome sequence of organisms and to annotate protein-coding genes and gain information about
the other important features of a genome.
The Human Genome Project (HGP) was a collaborative effort which was started in 1990. Its goal
was to sequence and identify all the three billion base pairs in the human genome.
The Human Genome Project aimed to completely map as well as understand the structure and
function all the genes of humans. This was followed by the identification of genetic variants
which escalate the possibility for common diseases such as cancer and diabetes and to develop the
appropriate treatment for it.
The human genome was sequenced in twenty universities and research centers that were located
in the US, the UK, France, Japan, Germany, and China.
The HGP was started in the United States and funded majorly by the U.S. Department of Energy
(DOE) and the National Institutes of Health (NIH).
A budget of$3 billion was set for carrying out the HGP which was planned to be spent in 3 stages
in 15-year period from 1990-2005, but due to accelerated progress the project was completed in
2003.
The goals of the HGP were to generate the genetic Map with a resolution of 2- to 5-cM (600 –
1,500 markers), to generate a Physical Map with 30,000 STSs, to sequence the euchromatin with
99% accuracy, to increase the capacity of sequencing with a subsequent reduction in sequencing
cost, to map the single nucleotide polymorphism, identification of full length cDNA, to sequence
the genomes of model organisms such as E. coli, S. cerevisiae, C. elegans, D. melanogaster which
could be subsequently used in comparative analysis.