chromosomal evolution

122
THE CHANGE EQUATION (THE FORMULA SYSTEM) 1. User (Autonomous Agent) request(s)/application selections [Morale/Cohesion 3 part format- right-side] 2. Feasibility study [Goals/Objectives 4 part format-right- side] 3. Investigation [Goals/Objectives 3 part format-left-side] 4. Analysis [Norms/Standards 5 part format-left-side] 5. Systems design [Goals/Objectives 4 part format-right-side] 6. Programming [Morale/Cohesion 5 part format-left-side] 7. Systems testing [Power/Authority 3 part format-right-side] 8. Documentation [Norms/Standards 3 part format-left-side] 9. Conversion and implementation [Goals/Objectives 3 part format-right-side] 10. Maintenance [Goals/Objectives 4 part format-left-side] 11. Evaluation [Norms/Standards 3 part format-left-side] 1. Project initiation (Hardware/Software) Power/Authority 2. Project development (The Project) Norms/Standards 3. Project implementation (The User Climate/Autonomous Agent Conditions of Configuration) Goals/Objectives 4. Post project evaluation (The Systems Analysts/Autonomous Agent Activities) Morale/Cohesion 1. Input subsystems [3 part Norms/Standards] 2. Computer subsystems[3 part Norms/Standards] 3. Output subsystems[3 part Norms/Standards] 1. Method Phase-One [5 part Goals/Objectives (The Dictionary of Occupational Titles)] 2. Method Phase-Two [5 part Goals/Objectives (The Dictionary of Occupational Titles)]

Upload: william-e-fields

Post on 10-Apr-2015

116 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Chromosomal Evolution

THE CHANGE EQUATION(THE FORMULA SYSTEM)

1. User (Autonomous Agent) request(s)/application selections [Morale/Cohesion 3 part format- right-side]2. Feasibility study [Goals/Objectives 4 part format-right-side]3. Investigation [Goals/Objectives 3 part format-left-side]4. Analysis [Norms/Standards 5 part format-left-side]5. Systems design [Goals/Objectives 4 part format-right-side]6. Programming [Morale/Cohesion 5 part format-left-side]7. Systems testing [Power/Authority 3 part format-right-side]8. Documentation [Norms/Standards 3 part format-left-side]9. Conversion and implementation [Goals/Objectives 3 part format-right-side]10. Maintenance [Goals/Objectives 4 part format-left-side]11. Evaluation [Norms/Standards 3 part format-left-side]

1. Project initiation (Hardware/Software) Power/Authority2. Project development (The Project) Norms/Standards3. Project implementation (The User Climate/Autonomous Agent Conditions of Configuration)

Goals/Objectives4. Post project evaluation (The Systems Analysts/Autonomous Agent Activities)

Morale/Cohesion

1. Input subsystems [3 part Norms/Standards]2. Computer subsystems[3 part Norms/Standards]3. Output subsystems[3 part Norms/Standards]

1. Method Phase-One [5 part Goals/Objectives (The Dictionary of Occupational Titles)]2. Method Phase-Two [5 part Goals/Objectives (The Dictionary of Occupational Titles)]3. Method Phase-Three [5 part Goals/Objectives (The Dictionary of Occupational Titles)]4. Method Phase-Four [5 part Goals/Objectives (The Dictionary of Occupational Titles)]5. Method Phase-Five [5 part Goals/Objectives(The Dictionary of Occupational Titles)]

Page 2: Chromosomal Evolution

Taxonomy Table

KingdomThis it the largest unit of

classification. Initially it was thought that there were only two kingdoms,

plants and animals. Eventually microscope and other tools helped

clarify the existence of other organisms. Now, there are a total of

5 kingdoms. Animalia - the largest with over 1 million named species,

fish, humans; Plantae - 350,000 species, trees, grass; Fungi -

100,000 species, mushrooms, lichen; Protista - 100,000 species,

green, golden, brown, and red algae, flagellates; Monera - 10,000 species,

blue-green algae or cyanobacteria.

Phylum/DivisionThe next most specific unit of

classification. This further divides the kingdom into 20 or so divisions

based on very distinct and defining characteristics. For example, within

the Animal Kingdom, a major division is the chordates that are animals

with notochords. This includes humans, fish, mammals, etc.

Flowering plants are defined into the antrophyta division of the Plant

Kingdom.

ClassThis further classifies the organism.

It separates them into categories that make them very similar in terms of

certain basic features. For example the class mammalia includes all animals that breast-feed, which

includes humans, cows, dolphins, etc. Another class would be reptilia

which includes cold-blooded and scaled animals.

Introduction:One of the most interesting fields of interest in the study of Biology is taxonomy. Although there are other fields out there such as ecology and embryology, taxonomy is easy to comprehend, restricted to a small set of structural information, and is good to know as reference. Taxonomy, also called systematics, is the study of the classification of all living organisms. The current method of taxonomy was started by Carlous Linnaeus which features organisms arranged into groups within groups within groups, on and on until an organism is defined within it's own species or individual group. This orderly classification helps scientists in a number of ways. One is that it keeps them clearly in sync with other scientists because of the existence of a universal system. It also helps scientists in identifying evolutionary links between certain species.

How it works:Originally, when Linnaues founded taxonomy, organisms were divided based on sole visible physical characteristics. Now they're separated based on any unique and defining features mainly external physical features and secondarily based on other features such as feeding habits.

Each organism is based on binomial nomenclature. This is in which an organism has two words to it's name. The first name is the genus and the second name is the specie. For example, humans are scientifically called Sapiens - genus Homo, species Sapiens. The words that make up the names for the individual groups of taxonomy are based on the Greek or Latin language. This makes for a universal language throughout the world. Otherwise an English scientist mentioning a "cat" to a Chinese person would be misunderstood because of language differences.

There are international commissions out there that help filter and record an updated listing of the classifications. Some names are based on the equivalent characteristics of the organism in Latin, or

Page 3: Chromosomal Evolution

OrderOrganisms of the same order are more similar that that of the same

class. A lot of obvious evolutionary connections can be drawn from looking at the order; only a few

features separate the organisms as a breaking in the evolutionary chain. One example is that within the class

Mammalia, carnivores are separated into the order Carnivora while Insect-

eaters are separated into the order Insectivora.

FamilyEven more specific, the animals

within this share a very close similarity between each other. Most

will probably have the same behavior patterns, feeding habits, and general

functions. An example is the Cat Family (Felidaes) which all have

whiskers, sharp claws, and include animals such as Lions and Cats.

GenusThis is the part that makes up the

first word of the binomial nomenclature of an organism. All the

organisms within their genus may look very similar to each other. And

although it is at most times not healthy, organisms of the same

genus may breed with each other.

SpeciesThe most specific unit of

classification is the species. The species makes up all the organisms

and their apparent ancestors and descendants. Members of the

species are much similar to their parents and can freely breed with

other members of the same species without much complication.

they could have no meaning at all and are just named after their founder.

The Origins of Taxonomy:Classification has been around on earth ever since people paid attention to organisms. One primeval system that was developed was based on "harmful" and "non-harmful" organisms. Then, the beloved Aristotle was the first to form a useful system of classification during the 300s BC. His was first based on whether the organism had red blood or didn't have red blood. Then he subdivided organisms such as plants by physical characteristics such as size and features. This system is somewhat crude by today's standards, yet it lasted over 2,000 years.

Eventually, as communication improved and science had advanced to a reasonable point, modern classification started to develop. The most popular founder was the Swedish naturalist Carolus Linnaeus in the 1700s. He developed the system by which organisms are classified based on the unique characteristics that they had. He also invented the binomial nomenclature for naming. Linnaeus agreed with scientists that his work was somewhat crude, but it's purpose and general concepts were continually applied. Over time, as evolutionary studies were extrapolated, the classification system has become more advanced showing different groups and links. And as time goes on, classifications continue to change and are ever-growing.

Page 4: Chromosomal Evolution

The draft sequence of the human genome has been integrated into many existing resources to facilitate biological discovery. The map below represents the interconnections between different types of public biological data available at NCBI.

Page 5: Chromosomal Evolution

 

Cellular Chemistry

Introduction

Hold on to your seat! This document attempts to cover the essentials of several chemistry courses- general chemistry, organic chemistry, and biochemistry- but just what a beginning biology student needs to know to survive cell biology, anatomy, physiology, microbiology, and related biology courses. This document assumes you know NO chemistry. If that is the case- it is normal to feel a bit overwhelmed as you study this material, but have courage- many students have studied and survived this material, and succeeded in their biology studies. You can too!

All organisms are made of cells, but cells are made of organelles and other subcellular components, that are made of molecules- orderly arrangements of atoms, or elements.

Atoms are so small that only 12 grams of carbon, such a small piece of charcoal, contains the amazing quantity of 602,300,000,000,000,000,000,000 atoms!.

So imagine how small a single atom is! There are many atoms or elements that exist, such as sodium, oxygen, copper, gold, and carbon. Though atoms differ in their physical properties, all atoms share similarity in their structure in that they are really all made of just three varieties of subatomic particles.

The Atomic Structure

The atomic structure is such that an atom has a central region, that is a nucleus, composed of protons and neutrons, and orbiting electrons.

Page 6: Chromosomal Evolution

Protons (symbolized as p) have a mass of 1 atomic mass unit (AMU) and an atomic electrical charge of +1. Neutrons (symbolized as n) have a mass of 1 atomic mass unit (AMU) and have no electrical charge (they are neutral). Electrons (e-) orbit the nucleus at various distances, or shell levels. These minute particles, traveling at the speed of light, have a mass of almost zero (about 0.008 atomic mass units [AMUs]), and they have an atomic electrical charge of -1. Normally their number equals that of the number of protons in the nucleus; in this way, the atom remains electrically neutral.

Calculating the Structure of an Atom or Molecule

The weight, that is mass of an atom or molecule, as well as the net electrical charge can be determined if the atom or molecules composition of atomic particles is known. The reverse is also true. Useful formulas for performing such calculations include the following: (where p, n, and e are symbolic for proton, neutron, and electron, and # is symbolic for 'the number of.')Net Atomic Mass=(#p + #n) Net Atomic Charge=(#p - #e)

[Sample atom with mass of 7 and net charge of 0]

Example: an atom with 5 neutrons, 3 protons, and 7 orbiting electrons would have a net atomic mass of 8 (=5+3) and a net atomic electrical charge of -4 (=3-7).

Example: How many protons and neutrons are there in an atom with a mass of 23 and 12 orbiting electrons if you know that the atom has a net charge of +3? Solution: Since the charge if +3, then there are 3 more protons than there are electrons (12), so there must be 15 protons. The number of neutrons is 23-15=8.

The Hydrogen Atom

Page 7: Chromosomal Evolution

H atom and H+ ion Hydrogen atoms are the simplest of all atoms, having a nucleus with a single proton and a single orbiting electron. The mass of the H atom is 1.008, with the electron contributing only 0.008 atomic mass units. If the electron is lost from the H atom, then a lone proton, p, remains, and is positively charged. The resulting particle is a hydrogen ion, electrically charged because the lone proton is not countered by any electron negativity. The hydrogen ion is symbolized as H+. Hydrogen ions are very important biologically because they are small and electrically charged, and can cause havoc to protein structure and cell function; this is particularly critical when H+ ions interact with enzyme proteins, critical for cell metabolism.

pH

The scale used to measure the concentration of H+ ions in a solution (blood, cytoplasm, etc.) is the pH scale. The pH scale runs from 0 to 14, with 7 neutral, 0 to 6.999 acidic, and 7.001 to 14 alkaline or basic.

|pH2-------pH4------------pH7------------pH11-------pH14|(acid pH) ------------(neutral)-------------(basic pH)|

Acids, that is molecules that release H+ ions, lower pH, and a low pH implies high concentrations of H+ ions. Bases, that is molecules that capture H+ ions, raise pH, and a high pH implies low concentrations of H+ ions. Water has neutral pH. Blood has

Page 8: Chromosomal Evolution

a pH of 7.35. Vinegar has a pH of about 4. Concentrated sulfuric acid has pH of about 1. Stomach acid has a pH of about 2. Toilet bowel cleaner, or lye, creates an extremely alkaline (basic) pH when added to a solution, resulting in a pH of about 12-14. Cell cytoplasm typically has a slightly acidic pH.

The pH scale is a log scale, based on powers of 10, so that a pH 6 solution has ten times the acidity as a pH 7 solution, and a pH 5 solution has ten times the acidity as a pH 6 solution. A pH 5 solution has one-hundred times the acidity as a pH 7 solution. Note that low pH implies high levels of H+, and that high pH implies low levels of H+ (most beginning students confuse this, so make a mental note of the reverse nature of the pH scale).

Empty Space

There is a lot of empty space between an atomic nucleus and orbiting electrons, and there is a lot of empty space between each e-. Physicists have determined that if all the empty space were removed from all the atoms of all the people of the planet earth, the entire earth's population could be condensed into a container smaller than the size of a thimble! And a single human being such as yourself could in theory be shrunk to the size of a single hydrogen atom. In fact, protons and neutrons are themselves made of smaller worlds in themselves, made up of quarks.

Page 9: Chromosomal Evolution

Quarks

Quarks are what actually comprise protons and neutrons. There are a variety of quarks, including the strawberry quark, the chocolate quark, and the vanilla quark (no kidding!). They don't really taste like chocolate, but the scientists that discovered them got a little giddy one night at the lab and decided to make scientific naming of atomic particles a bit more fun for everyone!

Gravitons

What holds all these subatomic particles together? We do not know exactly, but there is one possible answer. Gravitons are theoretical particles believed to exist in the nucleus, causing protons and neutrons to attract all other p and n, hence the attraction of all matter for all other matter (the reason your feet stay attracted to the ground and you do not fly off into outer space, and the reason the moon orbits the earth).

Molecules

Molecules are combinations of atoms, held together as a "team" by various forces called molecular bonds. Like a chain gang, if one atom in a molecule moves in one direction, the others are obliged to follow; though separate atoms, together they form a molecule. The molecule illustrated above in 3D is acetic acid (common vinegar acid)- the red spheres represent oxygen atoms, black carbon atoms, and white hydrogen atoms.

Bond types that hold atoms or molecules together, or in close proximity, include (in order of strongest to weakest): covalent, ionic, hydrogen, and Van der Waals Forces.

Page 10: Chromosomal Evolution

Covalent Molecular Bonds

This type of bonding occurs when two atoms share their orbiting electrons, somewhat like if two children were to stand inside two hula-hoops (each hoop being an orbiting electron) and spin the hoops around themselves. Neither child can leave the spinning pair of hoops (electrons) that keep them in proximity to each other. Covalent bonds are strong, and each covalent bond, that is each pair of shared orbiting electrons between atoms, is symbolized by a straight line drawn between the atoms. Sometimes two pairs of electrons, that is 4 e-, are shared between two atoms; then a double covalent bond occurs and this is symbolized by a double line (===). Three or four pairs of electrons can be shared, and that results in triple and quadruple covalent bonds, symbolized by...you guessed it, 3 or 4 lines drawn between the atoms, respectively.

Page 11: Chromosomal Evolution

For example, consider a molecule composed of 2 atoms of hydrogen and 1 atom of oxygen. A water molecule can be written as H20, or drawn as H-O-H. Look at.... no, interact with, the water molecule below! Hold down your left mouse button on the water molecule and you can rotate and view it in 3D space! [click here to do this with more molecules]

Consider a molecule similar to water, hydrogen sulfide (rotten egg gas- stinky!) composed of 2 atoms of hydrogen and 1 atom of sulfur. A molecule of hydrogen sulfide gas can be written as H2S, or drawn as H-S-H. All of the structures below represent hydrogen sulfide.

 

 

 H-S-H

Ionic Molecular Bonding

Page 12: Chromosomal Evolution

Hydrogen Molecular Bonding

Hydrogen bonds are attractions between hydrogen atoms and one or more of the following atoms: O, N, S, P, Cl, F. The six atoms just listed can be thought of as electron 'thieves,' stealing the majority of an electron's orbit time from the covalent bond that O, N, S, P, Cl, or F are a part of; as a consequence, the electron 'thief' atom takes on a partial negative charge density.

Hydrogen atoms, in stark contrast, are very weak at maintaining their electron in orbit about the hydrogen proton nucleus; an hydrogen atom's electron can be stolen away most of the time by other O, N, S, P, Cl, F atoms, causing the hydrogen atom to take on a partial positive charge density (caused by the proton with the orbiting electron being absent from the covalent bond most of the time). The result is a partial negative charge density about O, N, S, P, Cl, or F attracting nearby partial positive charge densities about H. Voila! A hydrogen bond.

It is hydrogen bonds that cause water molecules to have such strong attractions to each other, making for the high heating temperature needed to cause water molecules to escape from a water solution as steam.

Van Der Waal Forces

Page 13: Chromosomal Evolution

These are weak attractions between carbon atoms. Alone, each force is weak, but when stacked they become strong, much like lining up several batteries in series to create a series current (such as in a flashlight). Van der Waal Forces are significant in a cell's DNA genetic code, where the coiled DNA molecules have their carbon atoms stacked. In this way the Van der Waal forces help hold DNA together in its helical coil arrangement.

Ionic Molecular Bonding This occurs when there are electrical attractions between electrically charged atoms or molecules, that is between ions. Ions are atoms or molecules where the number of protons does not equal the number of orbiting electrons. This creates an electrical imbalance, so that the atom is now an ion, having either a net positive charge (cation), or a net negative charge (anion).

Ionic bonding, also known as a salt bond, occurs when a cation (positively charged atom or molecule) is electrically attracted to an anion (negatively charged atom or molecule). Table salt, sodium chloride or Na+Cl-, is a common example of a molecule held together by an ionic bond. Often the anionic atom species has stolen an electron from the cation atom species, creating the charged ions. Anion(-) :::::: (+)Cation Ions are atoms or molecules that have an inequality in terms of the number of protons and electrons. The cathode of a battery attracts cations, because the cathode is negatively charged. The anode of a battery attracts anions because it is negatively charged. Don't confuse a cathode with a cation- they have opposite electrical charges and so attract each other. Likewise with an anode and anions.

Salts are combinations of cations and anions, such as ordinary table salt, Na+Cl-, but the term salt can applied to any combination of cation and anion, including complex and large molecules, such as Tetracycline Hydrochloride (tetracycline H+Cl-), where the tetracycline is ionized to form a cation, but is kept stable in solution by combining with a chloride anion (Cl-).

Page 14: Chromosomal Evolution

Important Atoms, Ions, and Small Molecules studied in biology include: (memorize this list!)

H Hydrogen atom H+ Hydrogen ion (pH is a measure of H+ in a solution) C Carbon atom (present in almost all cell molecules) Oxygen atom Na Sodium atom Na+ Sodium ion (vital for cell membrane excitability) P Phosphorous atom (don't confuse this with Potassium!) K Potassium atom K+ Potassium ion (vital for cell membrane excitability) Cl Chlorine atom Cl- Chloride ion S Sulfur atom (present in many proteins) N Nitrogen atom (critical for amino acids and proteins) Ca++ Calcium ion (bone, cell excitability, and hormone regulation) Mn++ Manganese ion (stabilized cell enzymes) Mg++ Magnesium ion (stabilized cell enzymes) CO2 Carbon dioxide gas O2 Oxygen gas HCO3- Bicarbonate anion Zn++ Zinc ion Zn Zinc metal

"Wizardry"

Symbolic representations of atoms and bonds are commonly seen, or used, when observing or drawing chemical structures. When you understand the secrets that wizards use who draw molecules, you too will easily understand how to decipher molecular representations! So here are a few rules to commit to memory:

1. Remember that carbon atoms almost always form 4 covalent bonds, so each carbon atom in a molecule should have 4 bonds associated with it. Look at the 3D molecule of methan below- can you see the carbon atom (black) and the hydrogen atoms (white)? The carbon atom has formed 4 covalent bonds, one with each hydrogen atom (note that hydrogen atoms form only 1 covalent bond with whatever they bond with).

2. If you see a molecular drawing where a carbon has less than 4 covalent bonds, the remaining "unseen" bonds are always hydrogen atoms bonded to the carbon atom; they are not usually drawn so that wizards can draw molecules faster. Look at the wizardry representations of a common organic molecules, benzene.

Page 15: Chromosomal Evolution

3. When you see a straight line extending off a carbohydrate molecule (sugar, starch) into space, with no atoms at the end of the line, it is a wizard's trick (those sneaky wizards): wizards know that at the end of that line there is always an oxygen and then a hydrogen atom, this pair otherwise known as a hydroxyl group (-O-H, or -OH).

4. When you see molecular bonds drawn with angular bends in them, there is always a carbon atom at the bend or angle, even though the wizards do not draw it and so it looks like nothing is there; but now you know better!

Page 16: Chromosomal Evolution

Common Biological Molecules

The most common biological molecules include:

Carbohydrates: Always have an atomic ratio of 1C:2H:1O, that is 1 carbon for every oxygen and twice as many hydrogen atoms as either carbon or oxygen atoms.

o Sugars- glucose, fructose, sucrose, and so on. Important for energy and for building genes.

o Starches- animal starch (glycogen) and plant starch (cellulose). Starches are simply multiple sugars bonded together with various branching patterns between and among the bonding between the sugars.

Page 17: Chromosomal Evolution

N-containing Molecules o Amino acids- the building blocks of proteins. The amino acid shown

below is leucine, one of 3 amino acids known as the branched chain amino acids, natural anabolic nutrients that help build muscle mass and other tissues.

Page 18: Chromosomal Evolution

o Peptides- small proteins; sometimes the term peptide is used in place of protein. A short peptide is illustrated below in 3D (some hydrogen atoms are hidden from view).

o Proteins- enzymes, muscle protein, collagen skin protein, and so on. Urea, Ammonia- waste products of amino acid and protein metabolism.

Lipids: substances that are not readily soluble (mixable) in water. The molecule Benzene is illustrated below- it is a ring of 6 carbon atoms (black) with 6 attached hydrogen atoms (white); benzene is a common solvent used in organic and biochemistry for synthesizing other molecules, and in industry for cleaning. It is symbolized below as both a 3D model and a line drawing. Can you use your knowledge of wizardy (see above) to spot the carbon and hydrogen atoms in the line drawing? (line drawings are common because they

Page 19: Chromosomal Evolution

can be drawn quickly)

Compare the above representations of benzene then interact with benzene - Hold your left mouse on the 3D benzene molecule and you can rotate and view it in 3D space! BenzeneC6H6

o gasoline, oils, grease o Fatty acids- found in food oils; calorie source, as well as important in

cell membranes.

dietary fatty acids in foods. A fatty acid is illustrated below.

Prostaglandins- small lipids, actually fatty acids, that also act as chemical messengers. Prostaglandin E (PGE) is illustrated below using line drawing notation (can you spot the 20 carbon atoms

Page 20: Chromosomal Evolution

using the chemistry wizardry rules?)

o Triglycerides- common fat calorie storage molecules, made of 3 fatty acids linked together.

o Steroid hormones (estrogen, testosterone). Steroid hormones- are complex cyclic lipids used as chemical messengers that travel in the blood to target cells.

o Cholesterol (used to make steroid hormones)

Organic Acids- abundant during cell metabolism of sugars and fats. Acetic acid is illustrated below- it is formed during aerobic metabolism of carbohydrates or lipids within cells.

Carbohydrates

These biological molecules include the sugars and starches. They always contain a great deal of O, H and C, with a ratio of [C(H2O)]n, that is 1C:2H:1O Carbohydrates are important biologically as nutrients, structural components, and as antigens. Incidentally, the little n subscript is like an algebraic variable- it refers to an unspecified number of multiples of the molecule to which it is referring, in this case a molecule containing some multiple of C, H, and O in a specified ratio of 1:2:1. Sugars combine to form disaccharides (two sugar molecules linked together such as glucose + fructose forming sucrose cane sugar), polysaccharides (simple chains of sugars), and then starch (chains of sugars with complex branching patterns). The most common biological sugar is glucose, a six carbon sugar. Naturally occurring sugars are what chemists call right-handed, or D sugars, as in D-glucose, D-galactose, D-fructose. Sugars can also be left-handed, or L sugars. D and L refer to whether the molecules bend light in a special instrument to the right or left, respectively, that is, whether the molecules are dextro-rotatory or levo-rototory.

Page 21: Chromosomal Evolution

Ribose sugar is illustrated below (3D on left and line drawing on the right)- it is the sugar used for part of cell genetics, that is for making ribosomes, transfer RNA, and messenger RNA. By removing only one oxygen atom from ribose, a cell can form deoxyribose, the sugar used to build deoxyribonucleic acid (DNA).

Starches are long chains of sugar molecules with complex branching patterns of bonding between the sugar molecules. The two principle starches encountered in cells include glycogen and cellulose. Chitin is another starch that also contains nitrogen components; chitin is very strong structurally, and forms the dense protective shell of crabs, insects, and other animals as well as certain microbes. Glycogen is animal starch, stored in animal cells. Cellulose is plant starch. Both can serve as reserve nutrient sources, because sugar molecules can be cleaved off the starch and used for fuel. Cellulose starch also functions for cell membrane structural integrity in certain cells.

Lipids

These are substances that are not soluble in water. Lipids include dietary fats (cholesterol, fatty acids in margarine and other foods) as well as oils, grease, gasoline, steroid hormones, prostaglandin hormones, and many other biological molecules. Structurally lipids are comprised of lots of carbon and hydrogen atoms. Attached to the lipid at various points may be other atoms such as oxygen, or a side group such as a hydroxyl group (OH), but the great majority of lipid composition is that of lots of C and H.

Page 22: Chromosomal Evolution

Amino acids and Proteins

Proteins are very important molecules, functioning both as structural components of cells and as enzymatic molecules that catalyze (speed up) chemical reactions in cells. Proteins are made of building blocks called amino acids, there being about 22 different amino acids in found in nature.

Page 23: Chromosomal Evolution

Amino acids (and hence proteins) have what chemists call a left handed (L) configuration, so that naturally occurring amino acids are named L-arginine, L-glycine, and so on. Nutrisweet artificial sweetener is actually a synthetic substance consisting of only two amino acids bonded together. So why does it have zero calories? Because the amino acids that are part of Nutrisweet are right handed (R) amino acids, unrecognizable by your body, except of course by your taste buds. All amino acids (abbreviated as AA) have a generic structure with one end of the AA having an amino (-NH2 group) and one end of the AA having an organic acid (-COOH) group, sometimes called the carboxyl group.

Page 24: Chromosomal Evolution

Hence the name amino acid. Amino acids combine to form small chains of amino acids called peptides, or even longer AA chains called polypeptides or proteins. Sometimes the terms peptide, polypeptide, and protein are used interchangeably, because of the disagreement among scientists as to what constitutes a peptide versus a polypeptide versus a protein.

The bond that forms between amino acids to form peptides, polypeptides, and proteins is called the peptide bond, and is formed between amino and carboxyl groups. During peptide bond formation, water is removed, so the reaction is that of a dehydration synthesis reaction. The reverse of bond formation is bond breaking, by addition of water, in what is called a hydrolysis degradation reaction.

Page 25: Chromosomal Evolution

Muscle, skin, and connective tissue proteins, as well as intracellular proteins, are all formed by joining amino acids together with dehydration reactions. During starvation, hydrolysis of proteins yields free amino acids that are used for metabolism for help provide energy.

Page 26: Chromosomal Evolution

 

Cellular Genetics

DNA, RNA, Transcription, Translation, mRNA, tRNA, codon, anticodon

Genetic Encoding

Ciphering of cell information, that is genetic encoding, occurs in the form of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) in viruses, but always as DNA in cells (at least on our planet, but who knows what occurs in other galaxies?).

Structurally, DNA is not that complicated a molecule. It has a simple backbone consisting of alternating units of the sugar deoxyribose and phosphate (PO4). Attached to each sugar is a genetic code "word," a nucleotide base. Two such strands of DNA are usually bonded together to form what is called a "double helix" of DNA. The double stranded DNA molecule is twisted to form a helix, appearing as if a ladder were twisted along its axis.

Biochemically, the nucleotide bases of DNA are known as purines and pyrimidines. It is the nucleotide bases, or rather their sequence, that constitutes the actual genetic code for all of a cell's proteins. There are four nucleotides in DNA: adenine, thymine, guanine, and cytosine. They are abbreviated as A,T,G, and C. As you will learn soon, a codon, that is a sequence of three bases codes for a single amino acid. { How many bases along a length of DNA would be needed to code for an enzyme protein composed of 1000 amino acids? Answer: 3000. }

RNA

RNA is very similar to DNA. In RNA the base uracil, U, is substituted for thymine, T. So there is no thymine in RNA. Also, RNA uses the sugar ribose, not deoxyribose.

Genes

Along a molecule of DNA are various sequences of nucleotides coding for various proteins. Each sequence of nucleic acid coding for a protein is called a gene. Typically there will be hundreds or thousands of genes along a length of DNA, interspersed with special nucleotide sequences that are start (e.g. TAC) and stop (e.g. ACT or ATT) signals for gene reading by the cell.

Page 27: Chromosomal Evolution

DNA Arrangements

Viruses sometimes have ssDNA for their genome (or dsDNA, or ssRNA, or dsRNA), however all cells have double stranded DNA (two strands of DNA twisted on each other in a helical pattern). Hence the term alpha-helix for the three dimensional structural description of a double stranded DNA (dsDNA) molecule.

Two opposite strands (lengths) of DNA are able to associate because of the fact that certain base pairs have a binding affinity for each other. This is known as complimentary base pairing. A readily pairs with T, and G with C; these are known as the base pairing rules of DNA. Though two strands of single stranded DNA (ssDNA) are twisted on each other, each ssDNA carries its own unique genes; ssDNA is related to its complimentary strand of DNA only spatially, not genetically.

When dsDNA is genetically decoded, the helix is unzipped, a gene is "read" off the appropriate ssDNA by RNA polymerase enzyme (that creates mRNA), and then the helix is zippered shut.

dsDNA is always circular in bacteria, but linear in eukaryotic cells. Circular dsDNA is like taking two lengths of string, twisting them on each other, and then closing off the ends. Linear dsDNA is like taking two lengths of string and then twisting them on each other.

A chromosome is a dsDNA molecule coiled around special histone proteins. Chromosomes are visible in a stained cell when using a light microscope, and are normally visible when a cell is in the process of division. When a cell is not dividing, there is less coiling of the dsDNA around the histone proteins, and then the complex is called chromatin. Chromatin is barely visible in a cell, and is the normal state of the genetic material in a non-dividing cell.

Gene Decoding

Decoding of a gene to create a gene product involves transcription and translation. Transcription is the process where RNA polymerase enzyme unzips a region of dsDNA and reads a gene sequence, creating a copy of the gene sequence in the form of RNA. This RNA is called messenger RNA, or mRNA. The mRNA then is carried to a ribosome where it is "read" (decoded).

Translation is the process of building an amino acid chain, that is a polypeptide (protein) by way of ribosomal decoding of the mRNA. This involves the ribosome reading the mRNA and the bonding together of appropriate amino acids coded for by the mRNA. Just as triplets of nucleotide bases along DNA, called codons, encode for 1 amino acid of a gene product, triplets of mRNA are also codons. Ribosomes read the mRNA codons one at a time, to determine what amino acid should become part of the gene product. As a codon is read, the complimentary anti-codon nucleotide sequence of a special amino acid carrier molecule called transfer RNA, or tRNA, base pairs with the codon, bringing the specific amino acid with it that is coded for by the mRNA. As each mRNA codon calls its specific amino acid into place through the use of specific anti-codon complimentary tRNA amino acid carriers, the genetically

Page 28: Chromosomal Evolution

encoded amino sequence for the gene product is brought into place. The ribosome enzymatically bonds the amino acids together, and a polypeptide, or protein, is built. The translation process is complete, and the gene has been decoded.

Amino Acids Table

By consulting a table of codons coding for each amino acid, you can decipher a genetic sequence of DNA nucleotides or mRNA nucleotides to determine the resulting gene product, that is a protein (structural or enzymatic). For example, the nucleotide base sequence on mRNA (transcribed from the DNA sequence AGC) of UCG codes for the amino acid serine.

 mRNA        Second nucleotide base of mRNA codon  

First base of codon   

   U  C  A  G

 U

UUU=pheUUC=phe

UC*=ser

UAU=tyrAUC=tyrUAA=stopUAG=stop

UGU=cysUGC=cysUGA=stopUGG=trp

UUA=leuUUG=leu

 C CU*=leu CC*=pro

CAU=hisCAC=hisCAA=glnCAG=gln

CG*=arg

 A

AUU=ileAUC=ileAUA=ileAUG=met (start)

AC*=thr

AAU=asnAAC=asnAAA=lysAAG=lys

AGU=serAGC=serAGA=argAGG=arg

 G GU*=val GC*=ala

GAU=aspGAC=aspGAA=gluGAG=glu

GG*=gly

Amino Acid Symbol Amino Acid

Ala Alanine

Asp Aspartic

Asn Asparagine

Cys Cysteine

Glu Glutamic acid

Phe Phenylalanine

Gly Glycine

His Histidine

Ile Isoleucine

Lys Lysine

Leu Leucine

Met Methionine

Asn Asparagine

Pro Proline

Page 29: Chromosomal Evolution

Gln Glutamine

Arg Arginine

Ser Serine

Thr Threonine

Val Valine

Trp Tryptophan

Tyr Tyrosine

Glu,Gln Glutamic, Glutamine

* End Terminator

Page 30: Chromosomal Evolution

 

Cellular Replication

Mitosis, meiosis, budding, binary fission, conjugation

A detailed monograph on cell replication is available for those seeking more in-depth information.

Bacterial Cell Replication

Binary fission is the normal method of replication among bacteria; in this method of cell replication, the bacterial cells simply increase their cell mass slightly, replicate their cellular genome (DNA) and several other cell components, and then each cell divides equally into two cells.

[ rod-shaped bacterial cell undergoing binary fission ]

Binary fission as a method of cell replication is very efficient, with division possible every 5 or 10 minutes! Consider the number of cells formed from 1 cell that divides every 10 minutes: in just a matter of hours millions of cells may form from just a single cell!

Conjugation is another means of bacterial "replication" although the cells do not really replicate as with binary fission. But conjugation is important for propagation of bacteria. In conjugation, two bacterial cells meet, form a bridge, and exchange pieces of their DNA. This allows for sharing of genes among bacteria, even among different genera. To learn more about conjugation and bacterial genetics click here.

Eukaryotic Cell Replication

Page 31: Chromosomal Evolution

Budding is a simple method of cell replication used principally by yeasts (single celled fungi). Following DNA replication (genome replication), unequal splitting of a cell occurs to form two cells. Part of the cell literally pinches off, taking with it genetic material as well as some cytoplasmic material.

Mitosis is the common form of cell replication for tissue growth and regeneration among all multi-cellular organisms. The image panel below shows various phases of mitosis occurring among plant cells of an onion root tip. Each phase of cell division will be discussed individually.

During cell division, replication of cell genetic and cytoplasmic material occurs, followed by a highly organized splitting of the cell's contents. The two cells formed following mitosis, called daughter cells (lower right image in the six-panel image seen above), are genetically identical, and each has about 1/2 the cell mass of the original cell; shortly, however, each daughter cell will increase its size to that of a typical cell of the type from whence each daughter cell originated.

The process of mitosis is divided for human convenience into discrete stages or phases (also divisible into early, middle, and late phases) known as interphase, prophase, metaphase, anaphase, telophase, and finally daughter cells.

These six phases of mitosis can be seen in the photo below, if you read the photos as you would two lines in a book (left to right, then down to the second row and again left to right).

Page 32: Chromosomal Evolution

Animal Cells   Plant Cells

Interphase During interphase cells are busy doing their normal cell activities. Cell metabolism is occurring. The cell is doing whatever its normal function is (this depends on the cell's genetic programming). Interphase is actually not part of the normally listed phases of cell replication.

Prophase. During interphase, the DNA is replicated in preparation for prophase. A new set of genes (DNA) will be needed for the new cell that will be formed. As prophase occurs, the DNA coils tightly and becomes visible as chromosomes. The chromosomes are randomly arranged in the cell. The nuclear membrane disappears.

Metaphase. During metaphase a cell aligns its chromosomes in the middle region of the cell. Centrioles at each pole of the cell send out spindle fibers that grasp each chromosome. The cell is preparing to separate the chromosomes.

Page 33: Chromosomal Evolution

Anaphase. During anaphase the cell chromosomes are separated. Spindle fibers shorten so that the newly synthesized chromosomes (DNA) are pulled to one end of the cell. The original chromosomes (DNA) is pulled to the other end of the cell.

Telophase. During telophase, separation of chromosomes is complete. The cell begins to break apart into two cells. The chromosomes begin to uncoil. Nuclear membranes begin to reform around the chromosomes.

Daughter Cells. When mitosis is complete, the cell divides into two new cells, each resembling the original interphase resting cells, but smaller. Two cells now exist as a result of mitosis. One cell contains the newly synthesized DNA. The other cell contains the original DNA. Each cell has about one half the biomass of the original cells. Soon each cell will acquire nutrients and will grow in size so as to acquire the size that is normal for the cell type.   \

Allium. Seen below are phases of mitosis as seen in tissue sections of onion (Allium) root tip. Root tips are excellent tissue sections to study to learn mitosis, since root tips are rapidly growing and thus have many cells in stages of replication. Test your knowledge- can you spot the cells undergoing cellular mitosis? Can you name the phase for such cells? Click on an image to see an enlargement.

The cell in the very center is in the phase of mitosis known as anaphase. Notice the chromosomes splitting- half moving to the right, half moving to the left. The spindle fibers are faintly visible. The cells to either side of the anaphase cell are in interphase.This is a very low magnification photograph of onion root tip cells. Can you spot the cell undergoing metaphase in the center of this tissue section of about 50 cells? Also, the cell along the bottom, 4th from the left, is in metaphase.About 8 cells are seen here. In the lower left is a cell in anaphase. In the middle and somewhat towards the top is a cell in metaphase (aligned chromosomes). The other cells are in interphase and

Page 34: Chromosomal Evolution

prophase. The cell in the very center is in the phase of mitosis known as prophase. The chromosomes are coiled and are randomly arranged in the cell center. Just above the prophase cell is a cell that is just ending telophase- with daughter cells forming. The cell in the upper left is undergoing anaphase (first row, first cell on left). Move just one cell to the right and down one cell and you will see a cell in late telophase - with a cell plate having formed down the middle and with two nuclei of the soon-to-be daughter cells reforming.

Meiosis is a mode of cell replication that occurs only in the gonads (testis and ovary) of eukaryotes, in order to produce germ cells (sperm and egg cells, not 'germs' such as bacteria). Meiosis is a reduction division, where a cell's content of genetic material is reduced to form daughter cells having 1/2 the amount of DNA (and genes) found in regular body cells. Following meiosis, sperm and egg cells potentially combine during fertilization to form a fertilized egg called a zygote. The zygote now has the full complement of genetic material (1/2 + 1/2=1). When viewed under the microscope, the stages of meiosis can appear very similar to those of mitosis, so phases of meiosis will not be shown here.

Tumors

Uncontrolled replication of cells leads to cell overgrowths, that is tumors. Tumors can be classified as benign or malignant. Benign tumors are simply excessive cell growths that will not cause any significant harm. Malignant tumors, that is cancers, are cell growths where the cells are replicating without any inhibition of cell growth, and they will cause death to the organism if allowed to continue growing.

Naming Conventions for Tumors

Here are the naming conventions used for the more common tumors:

Carcinomas are cancers of epithelial tissues (cells lining the surfaces of an organism).

Sarcomas are cancers of connective tissues. Leukemias are cancers of white blood cells.

Lymphomas are tumors of the lymph nodes. Osteomas are tumors of bone. Osteosarcomas are sarcomas of bone tissue. Neuromas are benign tumors of nerve tissue. Leiomyomas are benign tumors of smooth muscle tissue. Rhabdomyomas are benign tumors of voluntary (skeletal) muscle. Chrondromas are benign tumors of cartilage. Chrondrosarcomas are malignant tumors of cartilage. Adenomas are benign tumors of glandular tissue

Page 35: Chromosomal Evolution

Adenocarcinomas are malignant tumors of glandular tissue. Look at the photo below- it is from a biopsy of a cancer. Several (3) cells show visible stages of mitosis (dark coiled chromosomes), indicating that the tissue is cancerous ( tissues have a certain percentage of their cells undergoing mitosis, called the mitotic index; when the mitotic index is high, as with the tissue below, a cancer or tumor of some sort is suspected.)

Carcinogens

Agents that can trigger cells to become tumorous include: environmental carcinogens in food, water, or air; cancer-causing genes called oncogenes that are transmitted by certain viruses; and inherent oncogenes, triggered by repeated trauma to a cell.

Page 36: Chromosomal Evolution

 

Cellular Arrangementsand

Tissues

There are four tissue types: nervous tissue, muscle tissue, connective tissue, and epithelial tissue. All multi-celled animal life forms are composed of various combinations of these four tissues.

BASIC TISSUE TYPES Example Photo

Nervous Tissue is specialized for creating and conducting electrical signals, and includes neurons (nerve cells) as part of its tissue. Neurons are the cells adapted for receiving and eliciting electrical signals. Signals are sent to other neurons, glands, and muscle cells. The photo on the right shows a classic nerve cell ("neuron") appearance- pointed edges giving it a quality somewhat like a "ninja star" or thorn.

Muscle Tissue is specialized for cellular contraction, and hence movement of the organism or parts of the organism. The photo on the right shows several muscle cells of the heart. Muscle cells tend to be elongated and red in appearance. Heart muscle has the characteristic cellular branchings such as are seen in this tissue section.

Connective Tissue is specialized to connect parts of an organism. Types of connective tissue include loose (like fascia, the filmy material you see when you pull the skin off chicken when skinning a chicken), tendons, ligaments, and so on. The photo on the right shows a section of bone tissue, just one of the many types of connective tissues (tendons, ligaments, bone, cartilage, fat, and blood).

Epithelial Tissue lines body surfaces, both internal and external, and is adapted for protection, secretion, and absorption. Epithelium is named, that is classified, according to its outer cell layer's shape, whether the tissue is one cell thick ("simple") or is layered ("stratified"), and whether the outer cells have cilia and whether some of the cells are goblet shaped mucous secreting cells. The photo on the right is a 3D scanning electron microscope photo showing several relatively flat epithelial cells covering a tissue surface.

Page 37: Chromosomal Evolution

Remember-- you are only to learn to differentiate the four basic tissue types! You are NOT expected to learn each of the specific tissue subtypes of the four basic tissues. So don't panic when you view all the different tissue subtypes.

NOTE: For more experience studying tissues, visit the histology lab center where you can learn more about cell arrangements and tissue types. Many digital images are available their for your viewing.

Tissue Development

Development of tissues occurs from primitive embryonic cell layers called germ layers. There are 3 germ layers that form in the embryonic cell mass:

GERM LAYER DEVELOPS INTO...

ectoderm (outer shell of cells) skin, brain, eye, nerves

mesoderm (middle cell layer) muscle, bone, vessels, connective tissues

endoderm (inner cell layer) gut, liver, pancreas

Fertilization and Zygote Formation

When a sperm cell fertilizes an egg cell, a fertilized egg or zygote is formed.

The zygote then divides into 2 cells, then 4 cells, then 16 cells, then 32 cells, then 64 cells, then 128 cells. Note that growth is at a geometric rate. The cell numbers of a developing embryo increases at a fantastic rate as a single fertilized cell matures into an embryo and then a fetus (nymphs or larvae in the case of insects, worms, and so on.)

Morula Formation

As cell mass increases from a fertilized egg dividing and with geometric cell mass increase, the embryo begins looking like a bunch of mulberries (well, sort of if you use some imagination), so that is what it is called. Except "mulberry" is translated into Latin, the universal scientific language, to form the word morula.

Blastula Formation

Soon the mulberry (morula) hollows out, forming a hollow cavity, sort of like a blown-up balloon, and it is then called a blastula.

Gastrula Formation

One end of the blastula invaginates, sort of like pushing your finger into the blown-up balloon. Now the embryo is said to be a gastrula; gastrulation has occurred. Note that there are now two cavities- the cavity in the balloon filled with air (call this cavity #1)

Page 38: Chromosomal Evolution

and the cavity formed by gastrulation (call this cavity #2). Cavity #1 will become the thoracic and abdominal cavities, and cavity #2 will become the gastrointestinal tract (Did you notice the "gast-" prefix in both gastrulation and gastrointestinal tract?).

Ectoderm

The outer layer of cells, that is the outer skin of the balloon, is what is called the ectoderm germ layer of cells, and as the embryo continues to grow and differentiate into a fetus the ectoderm cells will form tissues and organs such as skin, nervous tissue, brain, and the eye.

Mesoderm

The middle layer of cells, that is the inner skin of the balloon, is what is called the mesoderm germ layer of cells, and as the embryo continues to grow and differentiate into a fetus the mesoderm cells will form tissues such as muscle, blood vessels, cartilage, bone, ligaments, and other connective tissues.

Endoderm

The layer of cells lining the gastrulation cavity (cavity #2), that is the skin of the balloon surrounding your finger that you poked into the balloon, is what is called the endoderm germ layer of cells, and as the embryo continues to grow and differentiate into a fetus the endoderm cells will form epithelium lining the entire gut.

Tissue Components

Tissues are made of matrix and cells. Matrix is the non-cellular material between tissue cells, secreted by cells; matrix consists of both organic components (such as collagen and elastic proteins to give tissues strength and elasticity) and inorganic components (such as water and minerals).

Useful Suffixes. Cells of tissues are named according their tissue type, but many cells share common suffixes that reveal clues about their function. "- cytes" are mature cells that perform common tissue functions. "- blasts" are immature tissue cells that give rise to other mature tissue cells. "- clasts" are tissue destroying cells.

Page 39: Chromosomal Evolution
Page 40: Chromosomal Evolution

Homo sapiens Map ViewChromosome: [ 1 ] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y Master: Genes On Sequence Map Display settings Total Genes On Chromosome: 955Region Displayed: 0-220467 KbpGenes Labeled: 20 Total Genes in Region: 955

Page 41: Chromosomal Evolution

 symbol  orient. links cyto.  full name DKFZP564C186  +  av   sv  1pter-1p12  DKFZP564C186 protein

 SRM  -  av   sv  1p36-p22  spermidine synthase

 PLA2G2A  -  av   sv  1p35  phospholipase A2, group IIA (platelets, synovial fluid)

 PRO2047  +  av   sv  1  PRO2047 protein

 FLJ10468  +  av   sv  1  hypothetical protein FLJ10468

 FAAH  -  av   sv  1p35-p34  fatty acid amide hydrolase

 C8B  -  av   sv  1p32  complement component 8, beta polypeptide

 GADD45A  -  av   sv  1p31.2-p31.1  growth arrest and DNA-damage-inducible, alpha

 PRKCL2  +  av   sv  1pter-1q31.1  protein kinase C-like 2

 LOC51189  +  av   sv  1pter-1q31.1  ATPase inhibitor precursor

 FLJ10330  +  av   sv  1  hypothetical protein FLJ10330

 HPRP3P  +  av   sv  1q21.1  U4/U6-associated RNA splicing factor

 ARNT  +  av   sv  1q21  aryl hydrocarbon receptor nuclear translocator

 JTB  -  av   sv  1q21  jumping translocation breakpoint

 PEA15  +  av   sv  1q21.1  phosphoprotein enriched in astrocytes 15

 F5  -  av   sv  1q23  coagulation factor V (proaccelerin, labile factor)

 FLJ10083  -  av   sv  1  hypothetical protein FLJ10083

 ST16  +  av   sv  1  suppression of tumorigenicity 16 (melanoma differentiation)

 ESRRG  -  av   sv  1q41  estrogen-related receptor gamma

 CHS1  -  av   sv  1q42.1-q42.2  Chediak-Higashi syndrome 1

Page 42: Chromosomal Evolution
Page 43: Chromosomal Evolution
Page 44: Chromosomal Evolution
Page 45: Chromosomal Evolution

Archaeoglobus fulgidus, complete genome - 49546..99545

62 protein coding genes

Legend

Find Open Reading FramesCoding region on direct strandCoding region on complementary strandOverlapping region

Genetic States

Page 46: Chromosomal Evolution
Page 47: Chromosomal Evolution
Page 48: Chromosomal Evolution
Page 49: Chromosomal Evolution

Disease Histogram of Chromosome

Page 50: Chromosomal Evolution
Page 51: Chromosomal Evolution

Genetic Manipulation

Page 52: Chromosomal Evolution

Codons Found In DNA

Second Position of Codon

T C A G

First

Position

T

TTT Phe [F]TTC Phe [F]TTA Leu [L]TTG Leu [L]

TCT Ser [S]TCC Ser [S]TCA Ser [S]TCG Ser [S]

TAT Tyr [Y]TAC Tyr [Y]TAA Ter [end]TAG Ter [end]

TGT Cys [C]TGC Cys [C]TGA Ter [end]TGG Trp [W]

TCAG T

hird

Position

C

CTT Leu [L]CTC Leu [L]CTA Leu [L]CTG Leu [L]

CCT Pro [P]CCC Pro [P]CCA Pro [P]CCG Pro [P]

CAT His [H]CAC His [H]CAA Gln [Q]CAG Gln [Q]

CGT Arg [R]CGC Arg [R]CGA Arg [R]CGG Arg [R]

TCAG

A

ATT Ile [I]ATC Ile [I]ATA Ile [I]ATG Met [M]

ACT Thr [T]ACC Thr [T]ACA Thr [T]ACG Thr [T]

AAT Asn [N]AAC Asn [N]AAA Lys [K]AAG Lys [K]

AGT Ser [S]AGC Ser [S]AGA Arg [R]AGG Arg [R]

TCAG

G

GTT Val [V]GTC Val [V]GTA Val [V]GTG Val [V]

GCT Ala [A]GCC Ala [A]GCA Ala [A]GCG Ala [A]

GAT Asp [D]GAC Asp [D]GAA Glu [E]GAG Glu [E]

GGT Gly [G]GGC Gly [G]GGA Gly [G]GGG Gly [G]

TCAG

Codons Found In Messenger RNA

Second Position

U C A G

First

Position

U

UUU Phe

UCU

Ser

UAU Tyr

UGU Cys

U Third

Position

UUC UCC UAC UGC C

UUA Leu

UCA UAA Stop UGA Stop A

UUG UCG UAG Stop UGG Trp G

C

CUU

Leu

CCU

Pro

CAU His

CGU

Arg

U

CUC CCC CAC CGC C

CUA CCA CAA Gln

CGA A

CUG CCG CAG CGG G

A

AUU

Ile

ACU

Thr

AAU Asn

AGU Ser

U

AUC ACC AAC AGC C

AUA ACA AAA Lys

AGA Arg

A

AUG Met (start) ACG AAG AGG G

G GUU Val GCU Ala GAU Asp GGU Gly U

GUC GCC GAC GGC C

Page 53: Chromosomal Evolution

GUA GCA GAA Glu

GGA A

GUG GCG GAG GGG G

An explanation of the Genetic Code: DNA is a two-stranded molecule. Each strand is a polynucleotide composed of A (adenosine), T (thymidine), C (cytidine), and G (guanosine) residues polymerized by "dehydration" synthesis in linear chains with specific sequences. Each strand has polarity, such that the 5'-hydroxyl (or 5'-phospho) group of the first nucleotide begins the strand and the 3'-hydroxyl group of the final nucleotide ends the strand; accordingly, we say that this strand runs 5' to 3' ("Five prime to three prime") . It is also essential to know that the two strands of DNA run antiparallel such that one strand runs 5' -> 3' while the other one runs 3' -> 5'. At each nucleotide residue along the double-stranded DNA molecule, the nucleotides are complementary. That is, A forms two hydrogen-bonds with T; C forms three hydrogen bonds with G. In most cases the two-stranded, antiparallel, complementary DNA molecule folds to form a helical structure which resembles a spiral staircase. This is the reason why DNA has been referred to as the "Double Helix".

One strand of DNA holds the information that codes for various genes; this strand is often called the template strand or antisense strand (containing anticodons). The other, and complementary, strand is called the coding strand or sense strand (containing codons). Since mRNA is made from the template strand, it has the same information as the coding strand. The table above refers to triplet nucleotide codons along the sequence of the coding or sense strand of DNA as it runs 5' -> 3'; the code for the mRNA would be identical but for the fact that RNA contains U (uridine) rather than T.

An example of two complementary strands of DNA would be:

          (5' -> 3') ATGGAATTCTCGCTC      (Coding, sense strand)          (3' <- 5') TACCTTAAGAGCGAG      (Template, antisense strand)

          (5' -> 3') AUGGAAUUCUCGCUC      (mRNA made from Template strand)

Since amino acid residues of proteins are specified as triplet codons, the protein sequence made from the above example would be Met-Glu-Phe-Ser-Leu... (MEFSL...).

Practically, codons are "decoded" by transfer RNAs (tRNA) which interact with a ribosome-bound messenger RNA (mRNA) containing the coding sequence. There are 64 different tRNAs, each of which has an anticodon loop (used to recognize codons in the mRNA). 61 of these have a bound amino acyl residue; the appropriate "charged" tRNA binds to the respective next codon in the mRNA and the ribosome catalyzes the transfer of the amino acid from the tRNA to the growing (nascent) protein/polypeptide chain. The remaining 3 codons are used for "punctuation"; that is, they signal the termination (the end) of the growing polypeptide chain.

Page 54: Chromosomal Evolution

Lastly, the Genetic Code in the table above has also been called "The Universal Genetic Code". It is known as "universal", because it is used by all known organisms as a code for DNA, mRNA, and tRNA. The universality of the genetic code encompases animals (including humans), plants, fungi, archaea, bacteria, and viruses. However, all rules have their exceptions, and such is the case with the Genetic Code; small variations in the code exist in mitochondria and certain microbes. Nonetheless, it should be emphasized that these variances represent only a small fraction of known cases, and that the Genetic Code applies quite broadly, certainly to all known nuclear genes.

Codon Tables

Third Position

A C G U _____________________________ AA | Lys Asn Lys Asn F AC | Thr Thr Thr Thr i AG | Arg Ser Arg Ser r AU | Ile Ile MET Ile s P CA | Gln His Gln His t o CC | Pro Pro Pro Pro s CG | Arg Arg Arg Arg & i CU | Leu Leu Leu Leu t GA | Glu Asp Glu Asp S i GC | Ala Ala Ala Ala e o GG | Gly Gly Gly Gly c n GU | Val Val Val Val o UA | . Tyr . Tyr n UC | Ser Ser Ser Ser d UG | . Cys Trp Cys UU | Leu Phe Leu Phe

Another way to look at this is:

3 Letter 1 Letter DNA codons for each Amino AcidsNAME Abbreviation Abbreviation

Alanine Ala 1. A GCA,GCC,GCG,GCUCysteine Cys 3. C UGC,UGUAspartic Acid Asp 4. D GAC,GAUGlutamic Acid Glu 5. E GAA,GAGPhenylalanine Phe 6. F UUC,UUUGlycine Gly 7. G GGA,GGC,GGG,GGUHistidine His 8. H CAC,CAUIsoleucine Ile 9. I AUA,AUC,AUU

Page 55: Chromosomal Evolution

Lysine Lys 11. K AAA,AAGLeucine Leu 12. L UUA,UUG,CUA,CUC,CUG,CUUMethionine Met 13. M AUGAsparagine Asn 14. N AAC,AAUProline Pro 16. P CCA,CCC,CCG,CCUGlutamine Gln 17. Q CAA,CAGArginine Arg 18. R CGA,CGC,CGG,CGUSerine Ser 19. S UCA,UCC,UCG,UCU,AGC,AGUThreonine Thr 20. T ACA,ACC,ACG,ACUValine Val 22. V GUA,GUC,GUG,GUUTryptophan Trp 23. W UGGTyrosine Tyr 25. Y UAC,UAU

Stop Codons . UAA,UAG,UGA – B(2)J(10)O(15)U(21) Z(26)

An example of the multiple combinations of DNA possible for a single peptide is an example of spelling my first name (without a termination codon):

So to code for 'MARK' there would be 16 combinations, other sequences of 4 letters would vary in the number of possibilities based on the number of codons that could code for a single amino acid. Some amino acids have up to 6 codons that will be translated into a single Amino Acid.

M A R K M A R K M A R K M A R KMET Ala Arg Lys MET Ala Arg Lys MET Ala Arg Lys MET Ala Arg Lys=============== =============== =============== ===============AUG-GCU-AGA-AAG AUG-GCU-AGG-AAG AUG-GCU-AGA-AAA AUG-GCU-AGG-AAAAUG-GCG-AGA-AAG AUG-GCG-AGG-AAG AUG-GCG-AGA-AAA AUG-GCG-AGG-AAAAUG-GCC-AGA-AAG AUG-GCC-AGG-AAG AUG-GCC-AGA-AAA AUG-GCC-AGG-AAAAUG-GCA-AGA-AAG AUG-GCA-AGG-AAG AUG-GCA-AGA-AAA AUG-GCA-AGG-AAA

Clusters of Orthologous Groups

Page 56: Chromosomal Evolution

Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in 34 complete genomes, representing 26 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Proteins from two eukaryotic genomes were assigned to COGs and can be reached from each individual COG page

Code Name Proteinsin COGs

A Archaeoglobus fulgidus 2420 1849

O Halobacterium sp. NRC-1 2058 1404

M Methanococcus jannaschii 1786 1320

T Methanobacterium thermoautotrophicum 1873 1375

P Thermoplasma acidophilum 1479 1176

KPyrococcus horikoshii 2080 1365

Pyrococcus abyssi 1767 1443Z Aeropyrum pernix 2722 1169

Y Saccharomyces cerevisiae 5954 2175

Q Aquifex aeolicus 1560 1317V Thermotoga maritima 1858 1507

D Deinococcus radiodurans 3194 2176

R Mycobacterium tuberculosis 3924 2468

BBacillus subtilis 4118 2803

Bacillus halodurans 4066 2728C Synechocystis 3168 2113

EEscherichia coli 4286 3327

Buchnera sp. APS 575 559

F Pseudomonas aeruginosa 5567 4191

G Vibrio cholerae 3834 2745

H Haemophilus influenzae 1695 1504

S Xylella fastidiosa 2766 1491

N Neisseria meningitidis 2081 1455

U Helicobacter pylori 1578 1081Helicobacter pylori 1492 1062

Principal component analysis of genomes

List of COGs

Distribution

Co-occurrences

Phylogenetic patterns

Phylogenetic patterns search

Functional categories

J K L

D O M N P T

G C E F H I

R S

Pathways andfunctional systems

FTP

Page 57: Chromosomal Evolution

J99

J Campylobacter jejuni 1634 1289X Rickettsia prowazekii 836 674

I

Chlamydia trachomatis 895 631

Chlamydia pneumoniae 1053 647

LTreponema pallidum 1036 707Borrelia burgdorferi 1637 694

W

Ureaplasma urealyticum 613 401

Mycoplasma pneumoniae 689 423

Mycoplasma genitalium 471 376

Total 76,765 51,645

Protein coding genes distribution mapTo see map locations of genes, click on a region in the map, to zoom in on that region

Gene Classification based on COG functional categories

Page 58: Chromosomal Evolution

Birgid Schlindwein's

Hypermedia Glossary Of Genetic Terms

Chromosome The term was proposed by Waldeyer (1888) for the individual threads within a cell nucleus (gk. chroma, colour; soma, body). The self-replicating genetic structures of cells containing the cellular DNA that bears in its nucleotide sequence the linear array of genes. In prokaryotes, chromosomal DNA is circular, and the entire genome is carried on one chromosome. Eukaryotic genomes consist of a number of chromosomes whose DNA is associated with different kinds of proteins.

Related Terms:

Nucleus The term introduced by Brown (1833) for the more or less spherical structure which occures in cells and stains deeply with basic dyes. The cellular organelle in eukaryotes that contains the genetic material.

Nucleotide A subunit of DNA or RNA consisting of a nitrogenous base (purine in adenine and guanine, pyrimidine in thymine, or cytosine for DNA and uracil cytosine for RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Depending one the sugar the nucleotides are called deoxyribonucleotides or ribonucleotides. Thousands of nucleotides are linked to form a DNA or RNA molecule. See also base pair.

Gene The term coined by Johannsen (1909) for the fundamental physical and functional unit of heredity. The word gene was derived from De Vries' term pangen, itself a derivative of the word pangenesis which Darwin (1868) had coined. A gene is an ordered sequence of nucleotides located in a particular position (locus) on a particular chromosome that encodes a specific functional product (the gene product, i.e. a protein or RNA molecule). It includes regions involved in regulation of expression and regions that code for a specific functional product. See gene expression, allele.

Prokaryote Cell or organism lacking a membrane-bound, structurally discrete nucleus and other subcellular compartments. Bacteria are prokaryotes. Compare eukaryote. See chromosomes.

Eukaryote Cell or organism with membrane-bound, structurally discrete nucleus and other well-developed subcellular compartments. Eukaryotes include all organisms except viruses, bacteria, and blue-green algae. Compare prokaryote.

Page 59: Chromosomal Evolution

See chromosomes.

Protein A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the bodys cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies.

Related Terms:Genetic code

The sequence of nucleotides, coded in triplets (codons) along the mRNA, that determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence.

Related Terms:

Nucleotide A subunit of DNA or RNA consisting of a nitrogenous base (purine in adenine and guanine, pyrimidine in thymine, or cytosine for DNA and uracil cytosine for RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Depending one the sugar the nucleotides are called deoxyribonucleotides or ribonucleotides. Thousands of nucleotides are linked to form a DNA or RNA molecule. See also base pair.

Codon The term proposed by Crick (1963) for the sequence of nucleotides in DNA or RNA.which is responsible for determining that a specific amino acid shall be inserted into a polypeptide chain. There is more than one codon for most amino acids. It has now been established that the codon is a triplet of nitrogenous bases in DNA or RNA that specifies a single amino acid. See genetic code.

Messenger RNA (mRNA)

RNA that serves as a template for protein synthesis or for synthesis of cDNA. See genetic code.

Amino acid Any of a class of 20 molecules that are combined to form proteins in living things. The sequence of amino acids in a protein and hence protein function are determined by the genetic code.Amino acids contain a basic amino (NH2) group, an acidic carboxyl (COOH) group and a side chain (R - of a number of different kinds) attached to an alpha carbon atom.

Thus the general formula is:

Page 60: Chromosomal Evolution

Protein A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the bodys cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies.

Deoxyribonucleic acid (DNA)

The molecule that encodes genetic information. DNA is a double-stranded molecule held together by weak bonds between base pairs of nucleotides. The four nucleotides in DNA contain the bases: adenine (A), guanine (G), cytosine (C), and thymine (T). In nature, base pairs form only between A and T and between G and C; thus the base sequence of each single strand can be deduced from that of its partner.

DNA sequence The relative order of base pairs, whether in a fragment of DNA, a gene, a chromosome, or an entire genome. See base sequence.

Gene The term coined by Johannsen (1909) for the fundamental physical and functional unit of heredity. The word gene was derived from De Vries' term pangen, itself a derivative of the word pangenesis which Darwin (1868) had coined. A gene is an ordered sequence of nucleotides located in a particular position (locus) on a particular chromosome that encodes a specific functional product (the gene product, i.e. a protein or RNA molecule). It includes regions involved in regulation of expression and regions that code for a specific functional product. See gene expression, allele.

Related Terms:Yeast artificial chromosome (YAC)

A vector used to clone DNA fragments (up to 400 kb); it is constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. The inserts can be much larger than those accepted by other vectors such as plasmids or cosmids. (Cf. cloning vector).

Related Terms:Sequence tagged site (STS)

Short (200 to 500 base pairs) sequence of genomic DNA that has a single occurrence in the human genome and whose location and base sequence are known. Detectable by polymerase chain reaction, STSs are useful for localizing and orienting the mapping and sequence data reported from many

Page 61: Chromosomal Evolution

different laboratories and serve as landmarks on the developing physical map of the human genome.

Expressed sequence tag (EST) is STS derived from cDNA.

Related Terms:Single nucleotide polymorphism (SNP)

Sequence polymorphism differing in a single base pair.

Example for a single nucleotide substitution:Rice cultivars with 18% or less amylose had the sequence AGTTATA at the putative leader intron 5' splice site, while all cultivars with ahigher proportion of amylose had AGGTATA.See abstract of publication.

Genes

The units of hereditary information that occupies a fixed position (locus) on a chromosome. Genes achieve their effects by directing the synthesis of proteins.

Genes are composed of deoxyribonucleic acid (DNA), except in some viruses, which have genes consisting of a closely related compound called ribonucleic acid (RNA). A DNA molecule is composed of two chains of nucleotides that wind about each other to resemble a twisted ladder. The sides of the ladder are made up of sugars and phosphates; the rungs are formed by bonded pairs of nitrogenous bases. These bases are adenine (A), guanine (G), cytosine (C), and thymine (T). An A on one chain bonds to a T on the other (thus forming an A-T ladder rung); similarly, a C on one chain bonds to a G on the other. If the bonds between the bases are broken, the two chains unwind, and free nucleotides within the cell attach themselves to the exposed bases of the now-separated chains. The free nucleotides line up along each chain according to the base-pairing rule--A bonds to T, C bonds to G. This process results in the creation of two identical DNA molecules from one original and is the method by which hereditary information is passed from one generation of cells to the next.

The sequence of bases along a strand of DNA determines the genetic code. When the product of a particular gene is needed, the portion of the DNA molecule that contains that gene will split. A strand of RNA with bases complementary to those of the gene is created from the free nucleotides in the cell. (RNA has the base uracil [U] instead of thymine, so A and U form base pairs during RNA synthesis.) This single chain of RNA, called messenger RNA (mRNA), then passes to the organelles called ribosomes, where protein synthesis takes place. A second type of RNA, transfer RNA (tRNA), matches up the nucleotides on mRNA with specific amino acids. Each set of three nucleotides codes for one amino acid. The series of amino acids built according to the sequence of nucleotides forms a polypeptide chain; all proteins are made from one or more linked polypeptide chains.

Page 62: Chromosomal Evolution

Experiments indicate that one gene is responsible for the assembly of one polypeptide chain. This is known as the one-gene-one-polypeptide hypothesis.

Other experiments have shown that many of the genes within a cell are inactive much or even all of the time. Thus, at any time, it seems that a gene can be switched on or off. The process by which genes are activated and deactivated in bacteria has been determined. Bacteria actually have three types of genes: structural, operator, and regulator. Structural genes code for the synthesis of specific polypeptides. Operator genes contain the code necessary to begin the process of transcribing the DNA message of one or more structural genes into mRNA. Thus, structural genes are linked to an operator gene in a functional unit called an operon. Ultimately, the activity of the operon is controlled by a regulator gene, which produces a small protein molecule called a repressor. The repressor binds to the operator gene and prevents it from initiating the synthesis of the protein called for by the operon. The presence or absence of certain repressor molecules determines whether the operon is off or on. As mentioned, this model applies to bacteria. Gene regulation in higher organisms is less clearly understood.

Mutations occur when the number or order of bases in a gene is disrupted. Nucleotides can be deleted, doubled, rearranged, or replaced, with each alteration having a particular effect. The mutation generally has little or no effect; when it does alter an organism, the change is frequently lethal. A beneficial mutation will rise in frequency within a population until it becomes the norm.

The Cell

In biology, the basic unit of which all living things are composed. As the smallest units retaining the fundamental properties of life, cells are the "atoms" of the living world. A single cell is often a complete organism in itself, such as a bacterium or yeast. Other cells, by differentiating in order to acquire specialized functions and cooperating with other specialized cells, become the building blocks of large multicellular organisms as complex as the human being. Although they are much larger than atoms, these building blocks are still very small. The smallest known cells are a group of tiny bacteria called mycoplasmas; some of these single-celled organisms are spheres about 0.3 micrometre in diameter, with a total mass of 10-14 gram--equal to that of 8,000,000,000 hydrogen atoms. Human cells typically have a mass

400,000 times larger, but even they are only about 20 micrometres across. It would require a sheet of about 10,000 human cells to cover the head of a pin, and each human being is composed of more than 75,000,000,000,000 cells.

This article discusses the cell both as an individual unit and as a contributing part of a larger organism. As an individual unit the cell is capable of digesting its

Figure 1: The initial proposal of the structure of DNA by James Watson and ...

Page 63: Chromosomal Evolution

own nutrients, providing its own energy, and replicating itself in order to produce succeeding generations. It can be viewed as an enclosed vessel composed of even smaller units that serve as its skin, skeleton, brain, and digestive tract. Within this vessel innumerable chemical reactions take place simultaneously, all of them controlled so that they contribute to the life and procreation of the cell. In a multicellular organism cells specialize to perform different functions. In order to do this each cell keeps in constant communication with its neighbours. As it receives nutrients from and expels wastes into its surroundings, it adheres to and cooperates with other cells. Cooperative assemblies of similar cells form tissues, and a cooperation between tissues in turn forms organs, the functional units of an organism.

Special emphasis is given in this article to animal cells, with some discussion of the energy-synthesizing processes and extracellular components peculiar to plants. For detailed discussion of the biochemistry of plant cells, see photosynthesis. For full-length treatment of the genetic events in the cell nucleus, see heredity.

Contents of this article:

Introduction

    The nature and function of cells       The cell as a self-replicating collection of catalysts          The structure of biologic catalysts          Coupled chemical reactions             Photosynthesis: the beginning of the food chain             ATP: fueling chemical reactions       The cell as a replicator of information          DNA: the genetic material          RNA: replicated from DNA       The cell as an organized unit          Intracellular communication          Intercellular communication    The plasma membrane       Chemical composition and structure of the membrane          Membrane lipids          Membrane proteins          Membrane fluidity       Transport across the membrane          Permeation          Membrane channels          Facilitated diffusion             The glucose transporter             The anion transporter          Secondary active transport             Counter-transport             Co-transport

Page 64: Chromosomal Evolution

          Primary active transport             The sodium pump             Calcium pumps             Hydrogen ion pumps          Transport of particles             Endocytosis             Exocytosis    Internal membranes       General functions and characteristics       Cellular organelles and their membranes          The vacuole          The lysosome          Microbodies          The endoplasmic reticulum             The smooth endoplasmic reticulum             The rough endoplasmic reticulum          The Golgi apparatus          Secretory vesicles       Sorting of products by chemical receptors    The nucleus       Structural organization of the nucleus          DNA packaging             Nucleosomes: the subunits of chromatin             Organization of chromatin fibre          The nuclear envelope       Genetic organization of the nucleus          The structure of DNA          Rearrangement and modification of DNA       Genetic expression through RNA          RNA synthesis          Processing of mRNA       Regulation of genetic expression          Regulation of RNA synthesis          Regulation of RNA after synthesis    The mitochondrion and the chloroplast       Mitochondrial and chloroplastic structure       Metabolic functions          The mitochondrion             Formation of the electron donors NADH and FADH2

             The electron-transport chain             The chemiosmotic theory          The chloroplast             Trapping of light             Fixation of carbon dioxide.       Evolutionary origins          The mitochondrion and chloroplast as independent entities

Page 65: Chromosomal Evolution

          The endosymbiont hypothesis    The cytoskeleton       Actin filaments       Microtubules       Intermediate filaments       Structural relation of the filaments    The cell matrix and cell-to-cell communication       The extracellular matrix          Matrix polysaccharides          Matrix proteins          Cell-matrix interactions       Intercellular recognition and cell adhesion          Tissue and species recognition          Cell junctions             Adhering junctions             Tight junctions             Gap junctions       Cell-to-cell communication via chemical signaling          Types of chemical signaling          Signal receptors          Cellular response       The plant cell wall          Mechanical properties of wall layers          Components of the cell wall             Cellulose             Matrix polysaccharides             Proteins             Plastics          Intercellular communication             Plasmodesmata             Oligosaccharides with regulatory functions    Cell division and growth       Duplication of the genetic material       Cell division          Mitosis and cytokinesis          Meiosis       The cell division cycle          Controlled proliferation          Failure of proliferation control    Cell differentiation       The differentiated state       The process of differentiation          Embryonic differentiation          Adult differentiation       Errors in differentiation    The evolution of cells

Page 66: Chromosomal Evolution

       The development of genetic information       The development of metabolism    The history of cell theory       Formulation of the theory          Early observations          The problem of the origin of cells          The protoplasm concept       Contribution of other sciences    Bibliography       General works       Nature and function of cells       Special studies in cell morphology       Special studies in cell biology       Evolution

Summary

In biology, the basic unit of which all living things are composed. The cell is the smallest structural unit of living matter that is capable of functioning independently. All cells are similar in composition, form, and function. A single cell can be a complete organism in itself, as in bacteria and protozoans. Groups of specialized cells are organized into tissues and organs in multicellular organisms such as the higher plants and animals.

Cells were first observed in the 17th century, shortly after the discovery of the microscope. Their significance, however, was not understood until the early 19th century, when improvements in microscopy permitted closer observation.

Cells are made up of macromolecules (giant molecules) and various smaller molecules. The chief macromolecules are nucleic acids (DNA [deoxyribonucleic acid] and RNA [ribonucleic acid]), proteins, and polysaccharides. DNA comprises the genetic code that carries the essential character of the organism from generation to generation. RNA translates the genetic information into proteins, which carry out vital cell functions. Proteins, for example, recognize and transport specific molecules into and out of the cell and catalyze all chemical reactions within the cell. Polysaccharides function as structural molecules in the rigid cell walls of bacterial and plant cells and as storage molecules in the glycogen granules of vertebrate muscle cells.

Important among the smaller molecular components of cells are lipids, ATP (adenosine triphosphate), cyclic AMP(adenosine monophosphate), porphyrins, and water. Lipids are fatty substances that are a major component of cell membranes. ATP is the energy currency of the cell; this energy-rich molecule is formed when the cell needs to store energy and is broken down when the cell requires energy. Cyclic AMP functions as a regulator of cell activities; porphyrins are pigments essential for oxidation and photosynthesis. About 70 to 80 percent of a cell is water, which is vital to the chemistry of life.

There are two distinct types of cells: procaryotic cells, found only in blue-green algae and in bacteria, and eucaryotic cells, composing all other life forms. A

Page 67: Chromosomal Evolution

eucaryotic cell consists of an outer membrane, cytoplasm that contains various membrane-bound structures (organelles), and a membrane-bound nucleus that encloses the gene-bearing chromosomes. Procaryotic cells have a cell membrane and cytoplasm, but they have no nucleus (their genetic material is organized into a single chromosome) and they lack membrane-bound cytoplasmic organelles. The molecular composition and activities of the two types of cells, however, are very similar.

A cell is bound by a semipermeable membrane (called the plasma membrane) that enables it to exchange certain materials with its surroundings. The plasma membrane is made up of a double layer of lipids studded with proteins. Some of the proteins extend completely through the lipid layer, others only partially penetrate it, and still others are thought to be completely embedded within the lipid layer. In plants the membrane is enclosed in a rigid cellulose cell wall.

The space between cells is filled with the extracellular matrix, a gel of polysaccharides swollen with water molecules in which are suspended protein fibres that hold cells together to form tissues.

Within the cytoplasm of both procaryotic and eucaryotic cells are ribosomes, small bodies that are the sites of protein synthesis. In addition, eucaryotic cells have a variety of separate membrane-bound cytoplasmic organelles with special functions. These organelles include the endoplasmic reticulum, Golgi apparatus, lysosomes, mitochondria, and plastids. The endoplasmic reticulum is a network of channels that functions in the movement of materials within the cell. Associated with these channels is the Golgi apparatus, which is composed of sacs that bud off from the endoplasmic reticulum. These sacs transport cell products from the endoplasmic reticulum to their appropriate locations either inside or outside the cell. Lysosomes are sacs filled with digestive enzymes; they are capable of digesting worn-out cell parts or extracellular debris, such as dead cells or foreign microorganisms that have been engulfed by the cell. Mitochondria serve as the power plants of the cell; it is within these organelles that ATP is synthesized. Plastids are found in the cells of most plants but are absent from animal cells. Of immense importance are the plastids known as chloroplasts; they contain the machinery for photosynthesis, the process by which the energy of sunlight is captured to produce carbohydrates.

The nucleus is the control centre of eucaryotic cells. Within this membrane-bound structure lie the chromosomes, which carry the hereditary material. The DNA of the chromosomes directs protein synthesis in the cell; the DNA instructions are carried from the nucleus to the cytoplasm by messenger RNA (mRNA). Procaryotic cells have no membrane-enclosed nucleus. They do, however, have nuclear matter consisting of a single chromosome.

A eucaryotic cell divides, or reproduces, to form two genetically identical daughter cells in a process called mitosis. Prior to mitosis, the chromosomes replicate, so that there will be a complete set of hereditary instructions for each daughter cell. During mitosis, the doubled chromosomes are separated, with one copy of each going to each daughter cell. Among sexually reproducing eucaryotes, another type of cell division occurs in the formation of sex cells called gametes (i.e., eggs and sperm). This process is known as meiosis. It produces four gametes, each of which contains half the number of chromosomes of the parent cell. When a male gamete and a female gamete unite, they form a

Page 68: Chromosomal Evolution

new individual in which the full number of chromosomes is restored.

Procaryotic cells reproduce in various ways, the most common being binary fission. This process involves replication of the cell's lone chromosome and the subsequent splitting of the parent cell into two daughters. It thus resembles mitosis in eucaryotes, but it lacks the special apparatus involved in true mitotic division.

The two main types of cell death are necrotic cell death, or coagulative necrosis, and apoptosis, or programmed cell death. Necrosis occurs in a variety of contexts produced by disease, injury, or accident and is cell death imposed by external factors. A cell undergoing necrosis typically swells in size before its lysosomes rupture and the cell's internal contents spill out into extracellular space.

In response to specific intracellular and extracellular signals, cells can also undergo programmed cell death. This apoptosis is a normal cellular process that plays an important role in growth and development. This type of cell death is marked by the shrinking of the cytoplasm and nucleus, degradation of the chromosomes, and the final splitting of the nucleus into a number of membrane-bound fragments.

Page 69: Chromosomal Evolution

Approximate Chemical Composition of a Typical Mammalian Cell Component weight percent of total cell Water 70 Inorganic ions (sodium, potassium, magnesium, 1 calcium, chloride, etc.) Miscellaneous small metabolites 3 Proteins 18 RNA 1.1 DNA 0.25 Phospholipids and other lipids 5 Polysaccharides 2

Page 70: Chromosomal Evolution

Biological Development

The progressive changes in size, shape, and function during the life of an organism by which its genetic potentials (genotype) are translated into functioning mature systems (phenotype). Most modern philosophical outlooks would consider that development of some kind or other characterizes all things, in both the physical and biological worlds. Such points of view go back to the very earliest days of philosophy.

Among the pre-Socratic philosophers of Greek Ionia, half a millennium before Christ, some, like Heracleitus, believed that all natural things are constantly changing. In contrast, others, of whom Democritus is perhaps the prime example, suggested that the world is made up by the changing combinations of atoms, which themselves remain unaltered, not subject to change or development. The early period of post-Renaissance European science may be regarded as dominated by this latter atomistic view, which reached its fullest development in the period between Newton's laws of physics and Dalton's atomic theory of chemistry in the early 19th century. This outlook was never easily reconciled with the observations of biologists, and in the last hundred years a series of discoveries in the physical sciences have combined to swing opinion back toward the Heracleitan emphasis on the importance of process and development. The atom, which seemed so unalterable to Dalton, has proved to be divisible after all, and to maintain its identity only by processes of interaction between a number of component subatomic particles, which themselves must in certain aspects be regarded as processes rather than matter. Albert Einstein's theory of relativity showed that time and space are united in continuum, which implies that all things are involved in time; that is to say, in development.

The philosophers who charted the transition from the nondevelopmental view, for which time was an accidental and inessential element, were Henri Bergson and, in particular, Alfred North Whitehead. Karl Marx and Friedrich Engels, with their insistence on the difference between dialectical and mechanical materialism, may be regarded as other important innovators of this trend, although the generality of their philosophy was somewhat compromised by the political context in which it was placed and the rigidity with which their later followers have interpreted it.

Philosophies of the Heracleitan type, which emphasize process and development, provide much more appropriate frameworks for biology than do philosophies of the atomistic kind. Living organisms confront biologists with changes of various kinds, all of which could be regarded as in some sense developmental; however, biologists have found it convenient to distinguish the changes and to use the word development for only one of them. Biological development can be defined as the series of progressive, nonrepetitive changes that occur during the life history of an organism. The kernel of this definition is to contrast development with, on the one hand, the essentially repetitive chemical changes involved in the maintenance of the body, which constitute "metabolism," and on the other hand, with the longer term changes, which, while nonrepetitive, involve the sequence of several or many life histories, and which constitute evolution.

As with most formal definitions, these distinctions cannot always be applied strictly to the real world. In the viruses, for instance, and even in bacteria, it is

Page 71: Chromosomal Evolution

difficult to make a distinction between metabolism and development, since the metabolic activity of a virus particle consists of little more than the development of new virus particles. In certain other cases, the distinction between development and evolution becomes blurred: the concept of an individual organism with a definite life history may be very difficult to apply in plants that reproduce by vegetative division, the breaking off of a part that can grow into another complete plant. The possibilities for debate that arise in these special cases, however, do not in any way invalidate the general usefulness of the distinctions as conventionally made in biology.

Contents of this article:

Introduction    The scope of development       Types of development          Quantitative and qualitative development          Progressive and regressive development          Single-phase and multiphase development          Structural and functional development          Normal and abnormal development       General systems of development          Development of single-celled organisms          Open and closed systems of development          Blastogenesis versus embryogenesis       Constituent processes of development          Growth          Morphogenesis             Morphogenesis by differential growth             Morphogenetic fields             Morphogenesis by the self-assembly of units          Differentiation    Control and integration of development       Phenomenological aspects       Analytical aspects    Development and evolution       Effect on life histories          Length and timing of the reproductive phase          Recapitulation of ancestral stages          Adaptability and the canalization of development       Genetic assimilation    Bibliography

The Human Body

The physical substance of the human organism, composed of living cells and extracellular materials and organized into tissues, organs, and systems.

Human anatomy and physiology are treated in many different articles. For detailed coverage of the body's biochemical constituents, see Proteins; Carbohydrates; Lipids; Nucleic Acids; Vitamins; and Hormones. For information on the structure and function of the cells that constitute the body, see Cells. For detailed discussions of specific tissues, organs, and systems, see Blood;

Page 72: Chromosomal Evolution

Circulation and Circulatory Systems: The human cardiovascular system ; Digestion and Digestive Systems; Endocrine Systems: The human endocrine system; Excretion and Excretory Systems: The human excretory system ; Integumentary Systems: The human skin ; Muscles and Muscle Systems; Nerves and Nervous Systems; Reproduction and Reproductive Systems: The human reproductive system; Respiration and Respiratory Systems: Human respiration ; Sensory Reception: Human sensory reception ; Supportive and Connective Tissues: The human skeletal system . For a description of how the body develops, from conception through old age, see Growth and Development, Biological: Human growth and development.

Many entries describe the body's major structures. For example, see abdominal cavity; adrenal gland; aorta; bone; brain; ear; eye; heart; kidney; large intestine; lung; nose; ovary; pancreas; pituitary gland; small intestine; spinal cord; spleen; stomach; testis; thymus; thyroid gland; tooth; uterus; vertebral column.

Human beings are, of course, animals--more particularly, members of the order Mammalia in the subphylum Vertebrata of the phylum Chordata. Like all chordates, the human animal has a bilaterally symmetrical body that is characterized at some point during its development by a dorsal supporting rod (the notochord), gill slits in the region of the pharynx, and a hollow dorsal nerve cord. Of these features, the first two are present only during the embryonic stage in the human; the notochord is replaced by the vertebral column, and the pharyngeal gill slits are lost completely. The dorsal nerve cord is the spinal cord in human beings; it remains throughout life.

Characteristic of the vertebrate form, the human body has an internal skeleton that includes a backbone of vertebrae. Typical of mammalian structure, the human body shows such characteristics as hair, mammary glands, and highly developed sense organs.

Beyond these similarities, however, lie some profound differences. Among the mammals, only human beings have a predominantly two-legged (bipedal) posture, a fact that has greatly modified the general mammalian body plan. (Even the kangaroo, which hops on two legs when moving rapidly, walks on four legs and uses its tail as a "third leg" when standing.) Moreover, the human brain, particularly that part called the neocortex, is far and away the most highly developed in the animal kingdom. As intelligent as are many other mammals--such as chimpanzees and dolphins--none have achieved the intellectual status of the human species.

Contents of this article:

Introduction    Chemical composition of the body.    Organization of the body.    Basic form and development.    Effects of aging.    Change incident to environmental factors.

Summary

Page 73: Chromosomal Evolution

The Chemical composition of the body.

Chemically, the human body consists mainly of water and of organic compounds--i.e., lipids, proteins, carbohydrates, and nucleic acids. Water is found in the extracellular fluids of the body (the blood plasma, the lymph, and the interstitial fluid) and within the cells themselves. It serves as a solvent without which the chemistry of life could not take place. The human body is about 60 percent water by weight.

Lipids--chiefly fats, phospholipids, and steroids--are major structural components of the human body. Fats provide an energy reserve for the body, and fat pads also serve as insulation and shock absorbers. Phospholipids and the steroid compound cholesterol are major components of the membrane that surrounds each cell.

Proteins also serve as a major structural component of the body. Like lipids, proteins are an important constituent of the cell membrane. In addition, such extracellular materials as hair and nails are composed of protein. So also is collagen, the fibrous, elastic material that makes up much of the body's skin, bones, tendons, and ligaments. Proteins also perform numerous functional roles in the body. Particularly important are those cellular proteins called enzymes, which catalyze the chemical reactions necessary for life.

Carbohydrates are present in the human body largely as fuels, either as simple sugars circulating through the bloodstream or as glycogen, a storage compound found in the liver and the muscles. Small amounts of carbohydrates also occur in cell membranes, but, in contrast to plants and many invertebrate animals, humans have little structural carbohydrate in their bodies.

Nucleic acids make up the genetic materials of the body. Deoxyribonucleic acid (DNA) carries the body's hereditary master code, the instructions according to which each cell operates. It is DNA, passed from parents to offspring, that dictates the inherited characteristics of each human being. Ribonucleic acid (RNA), of which there are several types, helps carry out the instructions encoded in the DNA.

Along with water and organic compounds, the body's constituents include various inorganic minerals. Chief among these are calcium, phosphorus, sodium, magnesium, and iron. Calcium and phosphorus, combined as calcium-phosphate crystals, form a large part of the body's bones. Calcium is also present as ions in the blood and interstitial fluid, as is sodium. Ions of phosphorus, potassium, and magnesium, on the other hand, are abundant within the intercellular fluid. All of these ions play vital roles in the body's metabolic processes. Iron is present mainly as part of hemoglobin, the oxygen-carrying pigment of the red blood cells. Other mineral constituents of the body, found in minute but necessary concentrations, include cobalt, copper, iodine, manganese, and zinc.

The Organization of the body.

The cell is the basic living unit of the human body--indeed, of all organisms. The human body consists of more than 75 trillion cells, each capable of growth, metabolism, response to stimuli, and, with some exceptions, reproduction.

Page 74: Chromosomal Evolution

Although there are some 200 different types of cells in the body, these can be grouped into four basic classes. These four basic cell types, together with their extracellular materials, form the fundamental tissues of the human body: (1) epithelial tissues, which cover the body's surface and line the internal organs, body cavities, and passageways; (2) muscle tissues, which are capable of contraction and form the body's musculature; (3) nerve tissues, which conduct electrical impulses and make up the nervous system; and (4) connective tissues, which are composed of widely spaced cells and large amounts of intercellular matrix and which bind together various body structures. (Bone and blood are considered specialized connective tissues, in which the intercellular matrix is, respectively, hard and liquid.)

The next level of organization in the body is that of the organ. An organ is a group of tissues that constitutes a distinct structural and functional unit. Thus, the heart is an organ composed of all four tissues, whose function is to pump blood throughout the body. Of course, the heart does not function in isolation; it is part of a system composed of blood and blood vessels as well. The highest level of body organization, then, is that of the organ system.

The body includes nine major organ systems, each composed of various organs and tissues that work together as a functional unit. The chief constituents and prime functions of each system are summarized below. (1) The integumentary system, composed of the skin and associated structures, protects the body from invasion by harmful microorganisms and chemicals; it also prevents water loss from the body. (2) The musculoskeletal system (also referred to separately as the muscle system and the skeletal system), composed of the skeletal muscles and bones (with about 206 of the latter in adults), moves the body and protectively houses its internal organs. (3) The respiratory system, composed of the breathing passages, lungs, and muscles of respiration, obtains from the air the oxygen necessary for cellular metabolism; it also returns to the air the carbon dioxide that forms as a waste product of such metabolism. (4) The circulatory system, composed of the heart, blood, and blood vessels, circulates a transport fluid throughout the body, providing the cells with a steady supply of oxygen and nutrients and carrying away such waste products as carbon dioxide and toxic nitrogen compounds. (5) The digestive system, composed of the mouth, esophagus, stomach, and intestines, breaks down food into usable substances (nutrients), which are then absorbed from the blood or lymph; this system also eliminates the unusable or excess portion of the food as fecal matter. (6) The excretory system, composed of the kidneys, ureters, urinary bladder, and urethra, removes toxic nitrogen compounds and other wastes from the blood. (7) The nervous system, composed of the sensory organs, brain, spinal cord, and nerves, transmits, integrates, and analyzes sensory information and carries impulses to effect the appropriate muscular or glandular responses. (8) The endocrine system, composed of the hormone-secreting glands and tissues, provides a chemical communications network for coordinating various body processes. (9) The reproductive system, composed of the male or female sex organs, enables reproduction and thereby ensures the continuation of the species.

Cellular Articles in other Topics:       cytoskeleton

Page 75: Chromosomal Evolution

           cytoskeleton           from cytoskeleton

      division

         aging process               Tissue cell loss and replacement              from aging

         blastema formation               animal development              from animal development               Cell reproduction              from reproduction

         cellular components               Cytology              from morphology

         cleavage               Early development              from animal development

         cloning               clone              from clone

         epidermal differentiation               The epidermis              from skin

         fetus growth rate               Types and rates of human growth              from human development

         plant growth determination               Origin of the primary organs              from plant development               The contribution of cells and tissues              from plant development

         regeneration and cell renewal               Repair and regeneration              from human disease

         sexual reproduction specialization               Sex cells              from sex               Hormones              from sex

         structural unit of life               Life on Earth              from life               The earliest living systems

Page 76: Chromosomal Evolution

              from life

         vitamin deficiencies               Vitamins              from nutritional disease

      physiology            Historical background           from physiology

         aging process               human aging              from human aging               aging              from aging               Internal environment: consequences of metabolism              from aging

            cellular metabolism                  Endocrine system                 from human aging

         fluid regulation               Regulation of water and salt balance              from excretion

         genetic behaviour               genetics              from genetics

         hormones               Hormone chemistry.              from hormone

         interaction with drugs               General principles              from drug

         metabolism               metabolism              from metabolism               Coarse control              from metabolism

            circulatory system                  Main features of circulatory systems                 from circulation

            human body                  Organization of the body.                 from human body

            metabolic disease                  metabolic disease                 from metabolic disease

Page 77: Chromosomal Evolution

                 Disorders of porphyrin metabolism                 from metabolic disease

         pathology               Characteristics of cell and tissue changes              from animal disease

            cancer                  cancer                 from cancer                  ref. [cancer] passim to                  ref. [cancer20]

            cell death                  The "point of no return"                 from death

            cryosurgical tissue destruction                  cryosurgery                 from cryosurgery

            growth inhibition                  Abnormal growth of cells                 from human disease

            infection                  virus                 from virus

            radiation damage                  Radiation injury                 from human disease                  Major types of radiation injury                 from radiation

      scientific study

         cytology               cytology              from cytology

         genetic continuity and organization               Genetics              from zoology

         morphology of cells               The study of structure              from biology

         observations by

            Braun                  Braun, Alexander Carl Heinrich                 from Braun, Alexander Carl Heinrich

Page 78: Chromosomal Evolution

            Claude                  Claude, Albert                 from Claude, Albert

            Goodsir                  Goodsir, John                 from Goodsir, John

            Mohl                  Mohl, Hugo von                 from Mohl, Hugo von

            Mller                  Müller, Johannes Peter                 from Mller, Johannes Peter

            Palade                  Palade, George E.                 from Palade, George E.

         tissue culture examination               tissue culture              from tissue culture

      structure and function

         bacteria ingestion in phagocytosis               phagocytosis              from phagocytosis

         difference between animal and plant cells               ref. [animal]

         fertilization               fertilization              from fertilization

         human respiration               Peripheral chemoreceptors              from respiration, human

         lipid structural components               lipid              from lipid

         nucleic acid formation               nucleic acid              from nucleic acid

         spatial patterns localization               Structural and functional development              from biological development

Page 79: Chromosomal Evolution

Information Processing

Query languages

The uses of databases are manifold. They provide a means of retrieving records or parts of records and performing various calculations before displaying the results. The interface by which such manipulations are specified is called the query language. Whereas early query languages were originally so complex that interacting with electronic databases could be done only by specially trained individuals, recent interfaces are more user-friendly, allowing casual users to access database information.

The main types of popular query modes are the "menu," the "fill-in-the-blank" technique, and the structured query. Particularly suited for novices, the menu requires a person to choose from several alternatives displayed on the video terminal screen. The fill-in-the-blank technique is one in which the user is prompted to enter key words as search statements. The structured query approach is effective with relational databases. It has a formal, powerful syntax that is in fact a programming language, and it is able to accommodate logical operators. One implementation of this approach, the Structured Query Language (SQL), has the form

select [field Fa, Fb, . . . , Fn]

from [database Da, Db, . . . , Dn]

where [field Fa = abc] and [field Fb = def].

Structured query languages support database searching and other operations by using commands such as "find," "delete," "print," "sum," and so forth. The sentencelike structure of an SQL query resembles natural language except that its syntax is limited and fixed. Instead of using an SQL statement, it is possible to represent queries in tabular form. The technique, referred to as query-by-example (or QBE), displays an empty tabular form and expects the searcher to enter the search specifications into appropriate columns. The program then constructs an SQL-type query from the table and executes it.

The most flexible query language is of course natural language. The use of natural-language sentences in a constrained form to search databases is allowed by some commercial database management software. These programs parse the syntax of the query; recognize its action words and their synonyms; identify the names of files, records, and fields; and perform the logical operations required. Experimental systems that accept such natural-language queries in spoken voice have been developed; however, the ability to employ unrestricted natural language to query unstructured information will require further advances in

Figure 1: Structure of an information system.

Figure 3: A parsing graph.

Figure 4: A semantic network representation.

Figure 2: Document imaging.

Figure 5: The architecture of a networked information system.

Page 80: Chromosomal Evolution

machine understanding of natural language, particularly in techniques of representing the semantic and pragmatic context of ideas. The prospect of an intelligent conversation between humans and a large store of digitally encoded knowledge is not imminent.

Information searching and retrieval

State-of-the-art approaches to retrieving information employ two generic techniques: (1) matching words in the query against the database index (key-word searching) and (2) traversing the database with the aid of hypertext or hypermedia links.

Key-word searches can be made either more general or more narrow in scope by means of logical operators (e.g., disjunction and conjunction). Because of the semantic ambiguities involved in free-text indexing, however, the precision of the key-word retrieval technique--that is, the percentage of relevant documents correctly retrieved from a collection--is far from ideal, and various modifications have been introduced to improve it. In one such enhancement, the search output is sorted by degree of relevance, based on a statistical match between the key words in the query and in the document; in another, the program automatically generates a new query using one or more documents considered relevant by the user. Key-word searching has been the dominant approach to text retrieval since the early 1960s; hypertext has so far been largely confined to personal or corporate information-retrieval applications.

The exponential growth of the use of computer networks in the 1990s presages significant changes in systems and techniques of information retrieval. In a wide-area information service, a number of which began operating at the beginning of the 1990s on the Internet computer network, a user's personal computer or terminal (called a client) can search simultaneously a number of databases maintained on heterogeneous computers (called servers). The latter are located at different geographic sites, and their databases contain different data types and often use incompatible data formats. The simultaneous, distributed search is possible because clients and servers agree on a standard document addressing scheme and adopt a common communications protocol that accommodates all the data types and formats used by the servers. Communication with other wide-area services using different protocols is accomplished by routing through so-called gateways capable of protocol translation. The architecture of a typical networked information system is illustrated in Figure 5. Several representative clients are shown: a "dumb" terminal (i.e., one with no internal processor), a personal computer (PC), and Macintosh (trademark; Mac), and NeXT (trademark) machines. They have access to data on the servers sharing a common protocol as well as to data provided by services that require protocol conversion via the gateways. Network news is such a wide-area service, containing hundreds of news groups on a variety of subjects, by which users can read and post messages.

Evolving information-retrieval techniques, exemplified by an experimental interface to the NASA space shuttle reference manual, combine natural language, hyperlinks, and key-word searching. Other techniques, seeking higher levels of retrieval precision and effectiveness, are studied by researchers involved with artificial intelligence and neural networks. The next major milestone may be a computer program that traverses the seamless information

Page 81: Chromosomal Evolution

universe of wide-area electronic networks and continuously filters its contents through profiles of organizational and personal interest: the information robot of the 21st century.

Contents of this article:

Introduction    General considerations       Basic concepts       Information as a resource and commodity    Elements of information processing       Acquisition and recording of information in analog form       Acquisition and recording of information in digital form          Recording media          Recording techniques       Inventory of recorded information          Primary and secondary literature          Databases       Organization and retrieval of information          Description and content analysis of analog-form records          Description and content analysis of digital-form information             Machine indexing             Semantic content analysis             Image analysis             Speech analysis          Storage structures for digital-form information          Query languages          Information searching and retrieval       Information display          Video          Print             Printers             Microfilm and microfiche          Voice       Dissemination of information    Information systems       Impact of information technology       Analysis and design of information systems       Categories of information systems          Management-oriented information systems          Administration-oriented information systems          Service-oriented information systems             Computer-integrated manufacturing             Transaction-processing systems             Expert systems          Public information utilities       Impact of computer-based information systems on society          Effects on the economy          Effects on governance and management          Effects on the individual    Bibliography       Concepts of information and information systems       Information processing       Organizational information systems

Page 82: Chromosomal Evolution

       Public information utilities       Impact of information systems       Bibliographic sources

Information only adds value to your organization if people can find the content they need, when they need it. Your users need the tools to search, navigate and view mission-critical information—whether it’s stored in a structured database down the hall, on a Web server across the street, or in a word processing document saved on a file server half-way around the world. They need an intuitive solution that can keep up with the increasing amount of information they create and use every day. They need the power of Verity K2 Enterprise.

Connects the Right Users with the Right Content at the Right Time

The most accurate, scalable infrastructure available to power corporate portals, Verity K2 Enterprise gives your users the tools they need to turn information overload into competitive advantage. K2 Enterprise delivers rapid, relevant information retrieval with Verity’s advanced search, while its Intelligent Classification features let you organize information the way you organize your business. This lets your users navigate directly to the information they need through K2 Enterprise’s advanced user interfaces.

Behind your users’ browsers, K2 Enterprise’s open design ensures rapid integration with your existing e-business environment, while its scalable architecture gives your portal unlimited growth and reliable fault-tolerance. Regardless of how many documents are being searched or how many users are searching them, K2 Enterprise scales linearly with zero performance degradation. And its global support extends your portal to 24 languages and provides the flexibility to distribute content administration to the local offices that created and know it best.

Advanced Search

If your users can’t find information, they can’t act on it. That’s why the advanced Verity search, navigation, and viewing technologies that K2 Enterprise incorporates are so important to the success of your business. Using the robust Verity Query Language, you can implement these transparently to put the power of sophisticated queries behind simple, one-word searches. Novice users can get accurate results without using complex query syntax or understanding your corporate taxonomy. Features like smart correction of user errors, stemming expansion, query-by-example and automatic summarization guide your users to the information they need—even if they misspell search terms or don’t know where to start looking.

Intelligent Classification

Portals powered by Verity K2 Enterprise can do more than search and retrieve

Page 83: Chromosomal Evolution

specific information for your users. They can automatically organize your information assets to make them easier for your users to browse. Unlike automatic classification methods that rely solely on statistical clustering algorithms to group documents, Intelligent Classification combines machine efficiency with human intellect. Subject matter experts can refine the rules created by computers to apply business logic that can only be understood by the human mind.

Advanced User Interfaces

Effective portal solutions make information as easy to find and retrieve for novice users as it is for experts familiar with advanced search techniques and corporate taxonomies. Verity K2 Enterprise provides your users with advanced user interfaces that make both unstructured and structured information assets readily accessible. For example, you can create directories based on your corporate terminology through which users can navigate and restrict searches to find unstructured content. Or you can utilize K2 Enterprise’s parametric search to let users sort, filter and drill through structured information.

Rapid E-business Integration

Verity K2 Enterprise is designed for rapid integration into existing e-business environments. Its straightforward integration leverages your current investments, minimizing implementation costs and ensuring project success. The key is Verity K2 Architecture, which supports technologies such as COM and Java, and includes a flexible API that provides access to all of its advanced features. K2 Enterprise also supports the widest range of information and repositories of any portal solution on the market. These include HTML, XML, multibyte data, Web and file systems and ODBC compliant databases.

Unlimited Growth

Verity K2 Enterprise’s distributed architecture powers your portal with unlimited growth potential. By brokering searches, you can increase both the amount of information being searched and the number of users submitting queries—without any degradation in performance.

Fault Tolerant Operation

Verity K2 Enterprise’s brokered search ensures your site will always be up and running by routing queries to servers that are best suited to the task. This distributes load evenly, ensuring that response time never suffers because one server is sitting idle while others are overloaded and isolating hardware failures to deliver uninterrupted service to your users enterprise-wide—24 hours a day, seven days a week.

Global Support

Verity K2 Enterprise is the only portal infrastructure that supports true

Page 84: Chromosomal Evolution

enterprise-wide and global scale solutions. Features include multiple language capabilities and built-in flexibility that allows administration to be distributed across different geographic locations.

Multiple Language Support—All of K2 Enterprise’s components support multi-byte character sets, which allows you to index, classify, search and view information in 24 Asian and European languages. By partnering with leading vendors like IBM, Inxight and Basis Technologies, Verity provides best-of-breed language locales to guarantee that K2 Enterprise always delivers the most advanced stemming, tokenization and concept extraction available.

Flexible, Distributed Administration—By allowing you to distribute administration functions across geographic locations, K2 Enterprise puts administration of content in the hands of the groups that created and understand it best. Content can be administered on local servers, yet remain searchable enterprise-wide. Queries are transparently brokered to each local server, returning relevant results from across your enterprise with the performance of a single search engine.

The key to success in e-commerce is turning browsers into buyers—faster and more efficiently than your competitors can. Verity® K2 Catalog gives your e-commerce portal the power to do just that. By intuitively matching the right products to the right people, Verity Catalog dramatically increases sales and creates loyal customers who keep coming back for more.

The most effective, scalable infrastructure available to power e-commerce portals, Verity K2 Catalog ensures that your customers find exactly what they’re looking for—and more. Besides providing advanced Verity search that makes finding products on your site quick and easy, Verity K2 Catalog’s Intelligent Merchandising lets you influence purchasing decisions by suggesting related products, up-selling and promoting specific merchandise—adding profitable site stickiness to your online store. Adaptive personalization features take this a step further by tailoring the online shopping experience based on customer browsing patterns.

Behind the shelves of your e-store, Verity K2 Catalog’s open design ensures rapid integration with your existing e-business environment. And its scalable architecture gives your e-commerce solution the capability to accommodate unlimited growth of both your catalog and customers with zero performance degradation. This means your customers can fill their shopping carts fuller and faster with Verity K2 Catalog—24 hours a day, seven days a week.

Intelligent Merchandising

Page 85: Chromosomal Evolution

E-commerce portals powered by Verity Catalog can do more than retrieve specific products for customers. They can influence purchasing decisions through sophisticated online merchandising techniques that increase sales and recognize more revenue. Verity Catalog’s Intelligent Merchandising leverages Verity’s Intelligent Classification technology to create online aisles through which you can guide your customers directly to the products you want to sell them. Or you can employ it to build business-rules that promote overstocked products, recommend items that complement the ones your customers are looking for, or suggest substitutes for out-of-stock merchandise.

Profitable Site Stickiness

Site stickiness isn’t just about keeping customers on your site longer. It’s about keeping them longer because they’re spending more money. Verity K2 Catalog profitably increases your site’s "stickiness" with intuitive, accurate search. This is one of the key advantages of portals powered by Verity—because if your customers can’t find what they’re looking for with a few clicks of their mouse, you’ll lose them to a site where they can.

Rapid E-Business Integration

Verity K2 Catalog is designed to fit within existing e-business environments. Its rapid integration leverages your current investments by minimizing implementation costs and decreasing time-to-market. In addition, only Verity K2 Catalog gives administrators the control and flexibility necessary to deliver the organized, relevant information customers need to make quick, informed purchasing decisions without costly administrative overhead or expensive content repurposing.

Adaptive Personalization

Instead of relying on static user profiles, Verity personalizes the online shopping experience by dynamically adapting to each search based on past queries and customer preferences. Specific products can be promoted based on previous purchasing history to provide the right match between products and customers—whether they’re shopping for themselves or someone else.

Unlimited Growth

Verity K2 Catalog’s scalability, fault-tolerance and wide range of supported data are the foundation of a solid e-commerce portal. This means your customers can rely on you to sell them the products they want, when they want them—no matter how many people are shopping at your site. And you can grow your e-commerce business one customer—or one million customers—at a time.

Scalability—Expand your catalog and handle more queries as your customer base grows, without any degradation in performance.

Fault-Tolerant Operation—Verity K2 Catalog’s brokered search ensures your site will always be up and running by routing queries to servers that are best suited to the task. This distributes load evenly, ensuring that response time

Page 86: Chromosomal Evolution

never suffers because one server is sitting idle while others are overloaded and isolating hardware failures to deliver uninterrupted service to your customers—24 hours a day, seven days a week.

Structured and Unstructured Information—Verity K2 Catalog supports the widest range of both structured and unstructured information and repositories of any portal solution on the market: HTML, XML, multibyte data, Web and file systems and ODBC databases.

Multiple Language Support—Optional Verity Locales give K2 Catalog the power to sell your products in 24 Asian and European languages by recognizing, filtering, indexing and searching selected international character sets.

SIM is ideally suited to web site content management, especially for web sites that have a need for;

Management of structured documents, Large data volumes (up to millions of documents), Web based workflow and release control, including the ability to preview

changes and additions in place in the web site, Tightly integrated searching and table of contents support, Media asset management, where multimedia objects are Dublin Core

metadata cataloged and managed as a collected resource for the site. Dynamic presentation of documents which allows for customization based

on user needs, Hypertext link creation and multimedia object embedding that is

implemented in a completely word-processing package independent manner, greatly reducing integration costs for new editing packages,

Hypertext link management that tracks all links, allowing change impact analysis and easy "what points at me?" checking,

A choice of editing packages and approaches including MS Word, XML editors, SGML editors, HTML fill-in form support, and Direct XML editing through a fill-in form (for administrators!)

Public reference sites

To see the output of this web management system, visit the Textile Clothing and Footwear Australia site at TCFOZ

This site is maintained by non-technical content editors, who create content using Xmetal.

Another web site running with SIM Web site content management is Standards Australia , who wrote extensions to the SIM system in the ACE programming language to meet their particular needs. Standards Australia use MS Word as their editing package, using the SIM RTF->XML translator to convert and manage those documents in XML format.

Key Characteristics

Web Server: SIM Web server – multithreaded server – ACE used for application logic.

Page 87: Chromosomal Evolution

Platforms: Windows NT, Solaris

Code Base: ACE (SIM scripting language – object oriented java-like language with SGML/XML support).

User Interface: All user interfaces are provided with a standard web browser. Editing package is configurable.

Database Used: SIM Content Management Server – text retrieval database with SGML/XML native support. ODBC support is included in ACE, so content from other sources can be integrated.

Authentication Mechanism: The Web content management system currently maintains its own internal user database, but is being extended to support LDAP lookup for user authentication.

StyleSheet mechanism: The ACE language is ideally suited to XML->HTML conversion processing, as it is integrated with SGML and XML parsers (such as EXPAT, and sgmlp). The Web content management system does not currently support XSLT for stylesheeting, but the SIM group does have an XSLT engine in beta test, and it will be added as a supplementary mechanism in the future. One of the advantage of using ACE for stylesheeting is that it has powerful text manipulation features as well as XML/SGML support.

Workflow Support: Simple workflow support and release control is included. Documents can exist in a number of states including draft, pending review, released, suspended and deleted. Documents can be previewed on the web while in any status other than released – released documents are visible to other users. For complex workflow support, the SIM DMS (Document Management System) is available. This application supports complex Workflow management coalition standard workflow, with a web user interface. The SIM DMS product is separately licensed, and is still currently in Beta release.

SIM Documentation Management Solution

One of the keys to successful electronic delivery of technical documentation is the ability to re-use content, that is, deliver content in a number of different ways from a single source. This allows the same document and document components to be used over and over again. Re-use guarantees consistency : every user sees the same, correct version of a document. Re-use means efficiency : a document is written once only. Re-use allows for refinement : a document can be developed over time. It also allows, for example, different customized views of the same source documentation to be delivered to different classes of users; similarly, it allows the same source documentation to be delivered in multiple formats.

A Documentation Management Scenario

Consider, for example, a company that is producing a set of technical documents that are to be delivered to a number of different clients. Internet based delivery of the documents is one of the requirements; as a consequence, changes to any document will be immediately provided to customers via the web. Although the content to be delivered to each client may be substantially the same, there will

Page 88: Chromosomal Evolution

typically be some differences. These differences may result from variations in the products the technical documents are describing. Also, the clients may wish to add annotations to their documentation, to reflect, for example, field knowledge obtained in using the manuals to repair various problems. In these cases, the annotations may represent valuable intellectual property of each of the clients and customers will require that access to them be restricted to their own personnel. Thus the document repository to be delivered to the clients will generally consists of a core of common content, with additional content that is private to specific clients.

Documentation Components

Managing database content is more than just storing the raw text of documents and their accompanying figures. Documents can have internal structure, and there can be an external structure relating separate documents. For example, documents are often interlinked in a number of ways and these links are essential parts of the document content. When searching for documents, users often scan indexes to browse the terms contained in the document repository; these terms constitute the vocabulary of the document collection. Sophisticated users may also require to know the frequency of each of these terms in the document collection when conducting searches, in order to produce more effective queries. Documents can also have associated metadata that provides information about the document, such as author, or status, or security level. Metadata, too, can be used to drive more productive searches.

Customized Delivery and Effectivity

It is essential that the electronic publishing system deliver the correct document content, links and vocabulary to each class of users accessing the system. The need to provide an accurate snapshot of the database contents (i.e. text, figures, links and vocabulary and term frequencies) for each particular class of users is referred to as effectivity . Efficient provision of effectivity requires very sophisticated text database support.

Automatic Tables of Content

Another requirement for technical documentation include the ability to dynamically produce tables of content (TOCs) for each document from the XML document structure and content. Technical documents are often long, so that when viewing a fragment of a document, it is important to understand the location of that fragment in the context of the whole document. This can be achieved by displaying the TOC along with a document fragment, when the fragment is displayed. Since the documents change over time, it is necessary to generate these TOCs dynamically when the document is viewed.

Dynamic Update

Technical documentation can involve very large document collections, which must be updated dynamically. This means that the delivery systems must provide a scalable solution, one that is able to update and deliver content efficiently for fast growing document collections.

Page 89: Chromosomal Evolution

Key Points

In summary some of the key requirements for a technical document delivery system include:

The ability to repurpose content; for example, support multiple delivery formats from a single source,

Manage all components of documentation, including content, images, internal structure, links, vocabulary and metadata,

Support effectivity, namely deliver database snapshot appropriate to each class of users,

Provide dynamic tables of content (TOC) from the XML document structure and content,

Update and deliver documents quickly and efficiently, Provide powerful navigation searching and viewing, and Provide scalable solutions.

SIM Legislation Management Solution

The Nature of Legislation

The law is both complex and comprehensive. Not surprisingly, legislation databases are examples of large, very structured text collections. For example a single Act of Parliament which might be broken into many tens or hundreds of numbered sections, which in turn are broken into numbered subsections or paragraphs or sub-paragraphs. In large Acts these sections are grouped into chapters, parts, divisions and/or subdivisions, each with a label, and usually a section or title. A formal system of reference (or citation) allows each component of the database to be identified clearly and unambiguously.

Amendments to Legislation An important characteristic of legislation is that it changes over time. Sections or even larger units can be added, removed or altered. New law may be handed down to become legislation, creating a new principal Act where no Act previously existed. Existing legislation may undergo a complete restructuring, creating a new Act or Acts, replacing those previously in place. In between such creation and replacement, amending Acts can specify alterations to the principal Acts, perhaps changing the wording of one or two sections, or replacing complete sections, or even removing or inserting whole parts or chapters.

Legislation's Temporal Nature

Although only the principal Acts and the amending Acts have legal force, lawyers and legal researchers need access to the law as it existed during the time period relevant to their particular problem. From time to time, authorized Government bodies issue consolidations of particular Acts. A consolidation represents current law, presenting the principal Act as modified by the relevant amending Acts; that

Page 90: Chromosomal Evolution

is, with all additions, deletions, and changes to wording applied, and with all new components inserted. However, lawyers are often interested in the state of the law at times other than those for which officially released consolidations are available. Ideally, they would like to access consolidations of the law at arbitrary points in time .

Representing Structure with XML

The use of XML solves the problem of how to represent the structured text inherent in legislation. XML defines an abstract grammar for representation and exchange of text with tags interspersed throughout the text. A DTD (Document Type Definition) is a particular XML grammar describing which document components are valid and what sub-components they can contain. Acts from a given jurisdiction can be stored in XML in a format satisfying a particular DTD (which would state that every Act must contain sections and each section must contain text, or two or more subsections, and so on). One would then describe how to display a particular Act that satisfied the DTD by describing the presentation in terms of the DTD. A number of different presentation schemes can be described for a single DTD so that one might specify a presentation which only displays the table-of-contents to a specified depth, as well as a presentation for the whole Act. This is one of the advantages most often cited for using XML: the ability to reuse the same information for multiple purposes.

Long-term Availability

For information like legislation that continually changes over time, XML provides a safe format for the archiving of documents. Utilities such as word processors often use proprietary formats and are unable to read legacy documents, even those authored by a pervious version of same word processor. These problems do not exist if XML is used, because only the content and structure of documents are represented by XML; the presentation of documents is treated separately.

An End-to end Solution

Because the structure and content of the legislation is available to the application, in a form separate from presentation information, it is possible to develop powerful end-to-end solutions, not easily achievable if proprietary data representation standards are used. Using the Structured Information Manager, a legislation drafting and access system called EnAct was developed for the State Government of Tasmania in Australia. Enact solves the second problem listed above for legislation databases: the ability to search legislation databases at an arbitrary point in time and view the correct consolidation of an Act at that point of time. Note that accessing legislation databases does not only involve viewing text. Legislation databases consist of a large number of interrelated documents linked together by hyperlinks. Viewing a consolidation of legislation at a particular point in time involves retrieving the correct text as well as the correct hyperlinks at that point in time.

SIM, XML, and Legislation: an Ideal Partnership The EnAct system exemplifies the direction that legislation databases will

Page 91: Chromosomal Evolution

develop in the future, namely providing accesses to the correct state of the law at any point in time. EnAct is able to achieve this goal because the legislation is maintained in XML, allowing access to the structure and content of data, and because the SIM document management system, used for the development of EnAct, efficiently performs the operations on XML content required to achieve automatic consolidations.

SIM Intelligence Applications

In intelligence applications, it is normal to build and maintain an information repository fed from a number of sources and then conduct searches in order to locate relevant information. Such information repositories are in use in both military and commercial applications. Where the information is highly structured, conventional database management systems are used to maintain these data warehouses. Where the information consists of text and metadata, systems with advanced text database capabilities are required.

Large Scale, Dynamic Applications

In these applications, the information repository can range from a few gigabytes in size to hundreds of gigabytes or more. The repository may be static or, more typically, continually growing. For example, in the case of a news feed more than one gigabyte of new data can arrive over the course of every day. Other application areas may need to handle even greater dataflow. Some applications also need to migrate non-current data for archiving. For all large-scale high-load intelligence applications, high performance hardware/software architectures, such as multiprocessor Unix workstations, have to be deployed.

Building Information Repositories

The most important task when building an intelligence application is building and maintaining the information repository. When a new document is inserted into the repository, every word in the document must be extracted and indexed. This is a very expensive operation as a document may contain several thousand words. And, as noted, the amount of information to process can be very large indeed. SIM has been optimized for just such high volume environments, handling the update process as efficiently as is possible.

Another problem is that new documents may be arriving at the same time as the database is in use for searching. Although many existing text database systems support fast batch loading of data as an overnight operation when the database is off-line, they do not allow updates of the repository during the day when the database is in use. However, for any organization that requires up-to-date access to the most recent data, or access to its intelligence 24 hours a day, seven days a week, this is not acceptable. SIM has been specifically designed to support concurrent updates and queries, thereby providing 24 hour access to up-to-date information.

Searching Information Repositories

Page 92: Chromosomal Evolution

The reason for building an information repository is to provide access to the data it contains. Since the document collection can be very large, advanced search techniques are needed to locate desired information. SIM has been developed to support just such sophisticated searching. Queries can use Boolean logic, word position information (such as "same sentence", "same paragraph", "within n words"), document structure, and ranked relevance queries (where the documents are returned in order of relevance to the query) to locate target data. Each query type can combined as required. For example, to achieve high accuracy when querying a collection, a searcher could combine a Boolean query with a ranked query to identify a subset of the collection that can then be ranked against a set of ranking terms. Fuzzy matching is also important: for example, it can be common to have several alternative spellings (or misspellings) of a word. SIM provides support for fuzzy matching by computing a distance measure between two terms, so that the presence of alternate spellings need not frustrate the user's task.

Repository Management

To maintain large, high-performance information repositories, the quality of a system's database administration capabilities are of the utmost importance. For very large repositories, it can be desirable to split the data collection over multiple databases. SIM has the ability to do just that, while retaining the ability to search each database in parallel. With critical information collections, it is necessary to be able to back up repositories efficiently and robustly, and to be able to monitor and refine database performance. SIM provides administration utilities that are of the very highest quality and reliability, and that deliver the finest level of control.

A Proven Track Record

SIM provides an advanced, extremely rich, extremely reliable set of capabilities that support high-performance, secure intelligence applications. SIM has been successfully adopted by the Departments of Defense of both Australia and the U.S.A. for managing and searching large repositories of information.

SIM Knowledge Base Solution

Whether in the form of a human service or embodied as a physical product, ultimately, knowledge is every corporation's stock in trade. The pooled knowledge of an enterprise is its fundamental capital, its true wealth. Knowledge management is about leveraging corporate knowledge: identifying it where ever it may be found, storing it for re-use, and delivering it to where it is needed. SIM's advanced content management enables organizations to do just that. By focusing on content, SIM transforms opaque, non-functional documents into richly structured information sources. SIM's support of sophisticated content, structure, time, and metadata querying opens up the organizational knowledge base. And SIM's high-performance database management and web delivery enable it to deliver the right information to right people at the right time.

Page 93: Chromosomal Evolution

A Simple Model

Consider an organization that builds two databases over time and matches the documents from one against the contents of the other. One database represents knowledge, the other needs. This simple model of a knowledge base is applicable to many practical situations.

Sample Applications

For example, the human resources department of an organization might have one database of tasks needing to be performed, and another describing the qualifications and expertise of current staff. The department wishes to assign the most appropriate employee to each task. In order to make the assignments, it is necessary to match the tasks against the expertise database. In order to determine where an employee may best be deployed, the complementary action of matching of the expertise of the employee against the database of tasks can be performed. Similar requirements exist in an employment agency, in real-estate management, and in other information gathering and analysis applications.

A Detailed Example

Another example that fits this model is the administration of grant applications by a research body that is responsible for determining which grant applications should receive funding. A panel of experts has overall responsibility for recommending applications for funding. In order to do so, it is necessary to assess each application; accordingly, the panel must assign each application to an appropriate expert assessor. There are many types of interaction with such a system. Grant applications are submitted by applicants or by their organizations. Assessors, possibly from all over the world, must submit their reports and update their personal details. Members of the panel require full access to information about applications and assessors. A team of administrators may need access for general system maintenance or to generate reports.

Matching Information against Needs

There will typically be tens of thousands of assessors and applications covering a very wide range of research areas. In line with our simple model, a database of the submitted grant applications and a database of the assessors who may be approached to review applications must be built. A difficult task facing the panel of experts is choosing the appropriate assessors to review an application. SIM can use advanced relevance matching to help with this problem. In this approach, the text of an application is matched against the expertise of the potential assessors. Assessors who describe their expertise in terms similar to those used in an application are likely to be appropriate reviewers. With a single relevance query, an application can be matched against the complete database of assessors, and a ranked list of the closest matching assessors returned. The stronger the correlation between assessor and application, the higher the assessor is ranked. The panel of experts can then examine this ranked list and allocate assessors appropriately.

Page 94: Chromosomal Evolution

Knowledge Base Requirements

To develop such a knowledge base, the three important requirements are: A sophisticated content management system with advanced information retrieval and relevance matching capabilities, Web-based access to accommodate users that will be geographically dispersed throughout the world and High performance, including the ability to handle large volumes of data, and the ability to cope with heavy, peak interactive loads.

SIM and Knowledge Base

SIM technology has been successfully deployed to build knowledge databases. Our experience has been that the provision of web-based access has meant that the application is readily available to users, and the use of relevance matching between databases has led to significantly improved decision making within the organization.

SIM, the Structured Information Manager, delivers the enabling technology for the key components of knowledge management: storing knowledge, ensuring that it can be located, and delivering it to where it's needed, when it's needed. SIM Metadata Repository Management Solution. SIM MetaSite is a comprehensive solution for the collection, validation, classification and searching of metadata. SIM MetaSite forms a metadata repository which describes a distributed collection of resources, and provides a powerful browsing and searching interface to that repository.

Both a simple and advanced Web searching interface are provided with the SIM MetaSite product, to satisfy the needs of the general public or specialized users. Also, two lower level system interfaces are provided to the repository (one using http and one using Z39.50) to allow integration of the SIM MetaSite product into existing environments.

Metadata Repository

SIM MetaSite stores and manages a database of resource metadata. The resource metadata is stored in a standard XML format (RDF). The Metadata repository is managed dynamically, and can be updated while users are querying the system. A web interface is provided for management of batch operations, and interactive updates, deletes and insertions.

The Metadata repository is searchable and brows able on all Dublin Core fields. The Metadata repository can also support non-Dublin Core fields in a dynamic manner – allowing the system to change or evolve as standards change – without programmer intervention.

SIM MetaSite can handle metadata databases of very large size (>20 Gbytes). The Metadata repository includes fields for tracking of popularity/usage of metadata records. This information can be used to improve the visibility of metadata resources that are visited most often.

In addition to searching on metadata, the Metadata repository is capable of

Page 95: Chromosomal Evolution

containing the full text of resources, where searching of full text in combination with metadata fields is required.

MetaSite User Interface

SIM MetaSite is supplied with a user interface which allows metadata and full text searching along with thesaurus browsing, all in an easy to use HTML interface.

The interface allows frames or no-frames, java script or no-java script operation.

The provided user interface is highly configurable, which allows the MetaSite interface to evolve along with changing or clarified user requirements. The interface is stateless, yet allows user customization of the operation of the interface (for example in selection of the thesaurus to be used).

Creation and Loading of Metadata

The MetaSite crawler collects resources from the web in order to build the metadata repository. Its operation can be controlled in many ways;

by regular expression for included URLs, by regular expression for excluded URLs, by Mime type of data to be collected, by number of steps that can be followed (depth) from the starting

configuration file URL, and by number of steps that can be followed off-site from a valid on-site URL.

The crawler is multi-threaded, and can crawl multiple sites simultaneously. Delays between requests can be configured in order to reduce load on harvested sites. The crawler conforms to the ROBOTS.TXT standard for inclusion and exclusion. The crawler understands the RDF standard, and can follow links expressed in the RDF standard. Because the crawler is open and configurable, new data types and document types can be supported. Where data is located locally, the crawler will not duplicate such data, but will record references to the local data, thus allowing the repository to be populated without data duplication.

The MetaSite crawler also includes a configurable validation program. This program is usually run after the data has been collected, in a batch mode. The validation program checks the RDF data for validity, and can perform operations such as setting default values for metadata fields according to configurable rules – i.e. automatic generation of metadata, checking keyword entries against a central thesaurus, detecting duplicate data, and translating from META tags in HTML documents into RDF expressed in XML format.

Thesaurus Support

MetaSite allows the management of multiple thesaurus databases. These are used for validation of metadata records, and for browsing and searching within the user interface. Where multiple Thesauri are loaded, users can dynamically choose which thesaurus to use, depending on their preferences. Thesaurus access is tightly integrated into the user interface, and helps

Page 96: Chromosomal Evolution

significantly in targeting user queries, and helps in giving the user a sense of the overall content of the metadata repository. When the user browses through the thesaurus, the number of records within the metadata repository that correspond with each category in the thesaurus are displayed. When the user conducts a search the search results for each thesaurus category are shown.

Searching accuracy is also enhanced by using the Thesaurus to expand the user's query to include synonyms for the user's query terms. The query terms that were included are displayed to the user, and the user can choose to disable this functionality if they wish.

Thesauri are also used for validation of records during the loading process, for example for checking that restricted vocabularies are adhered to. This can also be used to map large vocabularies down to restricted vocabularies using the "alternate" field in the thesaurus. Thesaurus entries are stored in a standard XML format, making it easy to export and import new Thesauri. The thesaurus records are completely dynamically maintainable on-line – through a administration web interface. The thesaurus databases themselves are fully accessible via Z39.50, and the schema used for that access is configurable for the particular site requirements.

Open and InterOperable

A low-level http interface is provided to SIM MetaSite, which allows embedding of the MetaSite functionality into other web interfaces. The low level API allows access to most of the searching and presentation functionality of SIM MetaSite. SIM MetaSite is also fully accessible via Z39.50, and the schemas used for that access can be configured for the particular site requirements – indeed multiple Z39.50 Schemas can be used simultaneously for MetaSite access.

SIM Web site content management

SIM is ideally suited to web site content management, especially for web sites that have a need for;

Management of structured documents, Large data volumes (up to millions of documents), Web based workflow and release control, including the ability to preview

changes and additions in place in the web site, Tightly integrated searching and table of contents support, Media asset management, where multimedia objects are Dublin Core

metadata cataloged and managed as a collected resource for the site. Dynamic presentation of documents which allows for customization based

on user needs, Hypertext link creation and multimedia object embedding that is

implemented in a completely word-processing package independent manner, greatly reducing integration costs for new editing packages,

Hypertext link management that tracks all links, allowing change impact analysis and easy "what points at me?" checking,

A choice of editing packages and approaches including MS Word, XML editors, SGML editors, HTML fill-in form support, and Direct XML editing

Page 97: Chromosomal Evolution

through a fill-in form (for administrators!)