array vision genomics software - rapid and automated analysis of genome arrays

13
Contents Introduction ...................................................................................................................................................... 2 Analyzing Arrays ........................................................................................................................................... 3 MicroArrays ................................................................................................................................................... 4 Macro Arrays ................................................................................................................................................. 7 Genomics Imaging Systems and ArrayVision Software................................................................................... 7 Typical High Throughput Genetic Analyses ................................................................................................... 8 Library Screening ........................................................................................................................................... 8 Gene Expression and Functional Genomics .................................................................................................... 9 Summary of the ArrayVision System ............................................................................................................ 11 References ...................................................................................................................................................... 12

Upload: bob-roony

Post on 10-Oct-2014

131 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

Contents

Introduction...................................................................................................................................................... 2Analyzing Arrays ........................................................................................................................................... 3MicroArrays................................................................................................................................................... 4Macro Arrays ................................................................................................................................................. 7Genomics Imaging Systems and ArrayVision Software................................................................................... 7

Typical High Throughput Genetic Analyses ................................................................................................... 8Library Screening........................................................................................................................................... 8Gene Expression and Functional Genomics .................................................................................................... 9

Summary of the ArrayVision System ............................................................................................................ 11

References ...................................................................................................................................................... 12

Page 2: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

2

Introduction

Array-based genetic analyses start with cDNAs or oligonucleotides, immobilized on a substrate. The arrayelements are hybridized with a single labeled sequence (Fig. 1), or a labeled complex mixture derived from a tissueor cell line messenger RNA.

Figure 1: Array containing 2304 rectangles, each populated with 25 discrete elements from a YAC library, givinga total of 57,600 array elements. The entire membrane (24 x 24 cm) was hybridized with a simple mRNA labeledwith P32, and was exposed on a storage phosphor imaging plate. The intense dark spots arranged in pairs areused for positioning. The few probes which hybridized to the labeled sequence are seen as fainter spots. Most ofthe membrane is almost clear, indicating a low background. Specimen courtesy of Genome Systems, Inc.

Typical arrays vary from about 500 to 60,000 or more elements, with each element representing a discretehybridization assay. The rapid, simultaneous analysis of large numbers of hybridization assays gives array-basedgenetic analysis its high throughput characteristics.

Page 3: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

3

Analyzing Arrays

Array quantification involves a number of steps.

• The array of hybridization assays (array elements) is imaged.• A template consisting of a regularly spaced matrix of circles or squares (template elements) is placed over the

array.• The template elements are aligned with the array elements.• Data are reported.

The alignment step needs some explanation. Ideally, all of the template elements would align with the arrayelements. Unfortunately, most arrays exhibit some geometric error. Therefore, template elements are not perfectlyaligned with their targets. This results in two types of error.

• Hybridization values can be identified to incorrect positions within the array.• Hybridization values will be in error, wherever a template element does not fit precisely over an array element.

To avoid these errors, we must align the template. We could move each discrete template element to its properposition, by eye. This type of manual definition is so tedious as to be impractical for any but the smallest arrays. Itis also dangerous. After staring at thousands of dots for a few minutes, a human observer is prone to errors.

The alternative is to align templates, automatically. There are various procedures for this, including simplethresholding (finding array elements on the basis of intensity, e.g. Nguyen et al., 1995; Pietu et al., 1996), andmore complex spot finding algorithms (as in ArrayVision). The success of an array analysis system is, in largepart, dependent upon how well it succeeds in automated alignment. If alignment is inaccurate, a great deal ofediting is required and this is not much better than manual definition. To be really useful, an array analysispackage must align a template with minimal user editing, and across a variety of specimen formats (isotopic,luminescent/fluorescent, macro and micro arrays).

ArrayVision uses a fuzzy logic algorithm (patent pending) to place each discrete template element over the best fitlocation. It does this by evaluating the image around each template element. It uses signal intensity to determineif there is an array element that is a likely fit to that template element. Then comes the fuzzy part. If arrayelements exhibit strong and distinct signals, the software is quite ready to move template elements to new positionsover those array elements. If array elements are weakly labeled or unlabeled, the software tends to leave templateelements in their original (predefined) locations within the template. That is, the software uses confidenceweighting to align template elements to array elements. This allows each and every array element to be read,including those that fail to exhibit label above background. Unlabeled array elements will be read at the originalposition specified by the template, adjusted to fit within the context of more clearly defined elements.

Although the algorithmic exercise is a bit complex for the computer, it is simplicity itself for the user. Click themouse and the template aligns with the array. With most arrays, very little or no editing is required. Click againand data from thousands of array elements are reported, quickly and accurately (Fig. 2).

Page 4: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

4

Figure 2: Comparison of manual and automated alignment and detection. A 33P-labeled expression array wasimaged (phosphor imager) and displayed within ArrayVision. A template was generated to fit roughly over thearray. In the manual condition, the operator moved individual template elements to match the array. In theautomated condition, the alignment was performed entirely by the computer. Manual and automated alignmentare

in excellent agreement, over the entire range of hybridization intensities.

MicroArrays

Microfabricated arrays have been in use for some time (e.g. Eggers et al., 1994; Fodor et al., 1991; Lamture et al.,1994; Maskos and Southern, 1992; Mason, Rampal and Coassin, 1994; Pearson and Tonucci, 1995; Pease et al.,1994; Saiki et al., 1989; Southern, Maskos and Elder, 1992). The use of microfabricated arrays is growing, whilethe recent availability of commercial instruments for creating and detecting (e.g. Molecular Dynamics Avalanche)ad hoc microarrays underlies a rapidly expanding use of nonfabricated specimens.

Microarrays have advantages in achieving higher signal to noise with rare mRNAs. The proportion of total mRNArepresented by a particular mRNA species is not necessarily related to its functional importance. Species which arepresent in low copy numbers may be of interest, but are difficult to detect (e.g. Wan et al., 1996). Therefore, a goal

Hybridization Intensity: Manual vs. Auto Alignment33P Array

R2 = 0.9993

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 5000 10000 15000 20000 25000 30000 35000 40000 45000

Intensity: Manual Alignment

Inte

nsity

: Aut

o A

lignm

ent

Page 5: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

5

of miniaturization is to concentrate mRNA molecules into a smaller area, where they can be detected more easily(Fig. 3).

Page 6: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

6

Figure 3: Two array formats. The isotopic macroarray uses elements 1 mm in diameter. It is deposited on amembrane and detected using a phosphor imager. The Cy3-labeled microarray uses 250 um diameter elements,deposited on a microscope slide (microarray specimen courtesy of M. Erlander).

As nylon (Anchordoguy et al., 1996) or glass (Wittrup, Westerman and Desai, 1994) substrates are (withnonspecific hybridization) a major source of nonspecific background, using smaller and more highly concentratedassay sites can yield higher sensitivity. For example, consider distributing 4,000,000 molecules of a fluorescein-labeled mRNA over a target area of 1 x 1 mm. This would achieve a concentration of 4 molecules/µm2. This verylow concentration would probably not be visible above the substrate background fluorescence. In contrast,distributing the same 4,000,000 molecules over a target area of 20 x 20 µm would achieve a concentration of40,000 molecules/µm2, a concentration which has been used in evaluating the performance of microfabricateddevices (Chee et al., 1996). Scanning the small target with a tightly focused laser, or viewing it at highmagnification under a fluorescence microscope would allow very small amounts of signal to be detected. Problemsremain with hybridization to such small targets, but maturation of this technology will lead to routine imaging ofmany thousands of microscopic targets within small areas (e.g. DeRisi et al., 1996; de Saizieu et al., 1998;Khrapko et al., 1991; Schena et al., 1995; Shalon, Smith and Brown, 1996).

In addition to increasing the concentration of mRNA molecules/element by miniaturization, microarrays benefitfrom the use of fluorescence label.

a) Fluorescent labels with higher extinction coefficients and higher quantum yields have better signal relative tobackground (Oi, Glazer and Stryer, 1982).

b) We can attach more label molecules to the probe molecule.c) Longer wavelengths of excitation lead to less autofluorescence (Tsien and Waggoner, 1989).d) Time resolved fluorescence can be used to minimize nonspecific background (e.g. Hennink et al., 1996; Jovin

and Arndt-Jovin, 1989; Seveus et al., 1992).

Membranemacroarray

Glass micro-array

Page 7: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

7

It has been suggested that an array of 1 µm2 elements occupying 4 cm2 (about 4 million sites) would be sufficient toquery the 100,000 gene content of the entire human genome. Of course, the technology to create, hybridize to, anddetect such high density arrays remains to be developed.

In the push to ever higher array densities, microfabricated devices (which are readily miniaturized) will define thestate of the art. However, microfabrication is not very suitable for low-volume research applications, in thatadvanced manufacturing procedures (photolithography, light-directed combinatorial synthesis, etc.) are required.Therefore, microfabrication is only economical for large numbers of devices, targeted at specific sequences.Research laboratories or small companies cannot easily create novel microfabricated devices for their own genes ofinterest.

As an alternative to the relatively inflexible microfabricated arrays, non-fabricated arraying technologies aredeveloping rapidly. Tools for creating fluorescent microarrays (typically with probes 100-250 µm in diameter),improved substrates, and high resolution detectors are now available. These innovations combine with imageanalysis to yield a rapidly evolving capability for creating “ad hoc” microarrays.

Macro Arrays

The macro format (Figs. 1,4) was introduced some years ago (e.g. Gress et al., 1992; Guo et al., 1994; Khrapko etal., 1991; Saiki et al., 1989; Zhao et al., 1995) and is in fairly widespread use. Typically, arrays are laid down onmembranes as spots of about 1 mm in diameter. These large spots are easily produced with robots, and are wellsuited to isotopic labeling because the spread of ionizing radiation from an energetic label molecule (e.g. 32P)precludes the use of small, closely-spaced elements. Detection is most commonly performed using storagephosphor imagers.

Figure 4: A 33P-labeled Clontech Atlas Array, imaged with a Molecular Dynamics phosphor imager andanalyzed with ArrayVision. This expression array contains 1,176 discrete probes organized as 6 blocks of 196probes. Each block represents a different class of cellular function. Most probes exhibit some degree ofhybridization to the complex test sample.

Although most macro array studies have used isotopic label, laboratories are also working with fluorescent labelson membranes or glass. These fluorescent arrays can be made with the same types of technologies used in creatingisotopic arrays (e.g. non-contact dispensers, pin tools). Detection can use scanning fluorescence imagers (such asthe MD FluorImager), or CCD-based low light imaging systems. Scanning laser systems are easy to use and canbe quite cost-effective. CCD-based systems have the advantage that they are not limited to fixed laser lines.Rather, they can use any wavelengths produced by interference filters.

Genomics Imaging Systems and ArrayVision Software

ArrayVision software is used for rapid and automated analyses of images generated from any macro or microimaging devices. Contact us for details regarding complete systems (Fig. 5).

Page 8: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

8

Figure 5: ArrayVision accepts data from almost any detection system.

ArrayVision softwaresoftware

macro array imaging system

imaging plate readerscanning microscope

scanning laser

camera

Typical High Throughput Genetic Analyses

Library Screening

The defining feature of library screening is that the probe is simple, containing only one or a few complements.The use of large arrays provides the best likelihood that hybridization will occur, even with allelic variation in thetarget and limited sequence lengths in the array elements. A typical screening image contains many thousands ofunlabeled elements, and a few points where elements have hybridized. Analysis is less concerned withquantification of hybridization intensity than with localizing labeled array elements to their proper locations in thearray.

Accurate localization requires alignment of the template to the entire array, even though most of that array isblank. ArrayVision uses specific “anchor” spots on the array, to provide spatial points of reference (see Fig. 1).Using these anchors, the software performs accurate alignment of the template and localization of array elements.

Even though hybridization intensity is not the main objective of a screen, ArrayVision does generate quantitativedata for every element (Fig. 6). Therefore, objective statistical methods can be applied to identify hits. Forexample, we select those elements whose hybridization intensity is more than four standard deviation units awayfrom the mean. The software reports these elements as numerical data, and as a graphical display of hit locations.

Page 9: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

9

Figure 6: Frequency histogram of 57,576 elements, with a mean of 2,124 counts, and a standard deviation of1,172. The Y axis is plotted logarithmically, so that we can see small numbers of data points lying in the higherintensity bins. For example, there are fewer than twenty elements lying above 27,500 counts. It is likely(probability better than 99.999%) that these probes form their own, unique distribution - the distribution of hits.

Gene Expression and Functional Genomics

In expression analysis, libraries of cDNAs or oligonucleotides are hybridized with total genomic mRNA. Wheneverything is working correctly, the expression level of a gene is reflected in the number of mRNA copies that itcontributes to the mixture, and is proportional to the signal detected at complementary elements in the high-densityarray.

The key difference between library screening and gene expression is analysis does not look at just a few hits on aclear background. Rather, there is hybridization to almost every array element (Fig. 7).

Figure 7: A small portion of an expression array, showing that there is hybridization at almost every element.

Expression levels can be analyzed within a single sample, or across multiple samples. Typically, expression iscompared across tissues or cell lines (Fig. 8). This is done by using replicate arrays, exposed to different mRNAconditions.

Figure 8: A macro array containing 1,536 discrete probes, hybridized to a complex mixture of mRNAs. At left,we see the image without a template. The image at right shows the aligned template over the array. Expression

1

10

100

1000

10000

100000

2500 7500 12500 17500 22500 27500 32500 37500 42500 47500 More

Counts/mm2

Page 10: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

10

data are taken from the entire array. ArrayVision placed and aligned the template and read data, allautomatically.

A problem in studying gene expression is the sheer volume of the data. Expression arrays can contain tens ofthousands of targets replicated across multiple conditions. It is not unusual for the computer to be jugglingmatrices containing >100,000 numbers. Therefore, it is important that the image analysis software be designed tohandle such large data sets. ArrayVision handles large data matrices as efficiently as possible, and can export theresults of analyses directly to your own data structures.

A good analysis system should go beyond just reporting the data. It should also give you procedures for makingobjective comparisons of gene expression across conditions. This issue of comparing expression is non-trivial.Each array specimen will differ from the others in the absolute intensity of signal, so we cannot simply comparesignal strength across specimens. Rather, irrelevant inter-specimen variation must be minimized to allow validcomparisons.

The most common method for minimizing irrelevant variation is normalization within arrays, and subsequentcomparison of normalized values across arrays. Because we are dealing with ratios during normalization,background is subtracted prior to the normalization process. There are many methods for normalization, includingselection of specific reference elements (e.g. housekeeping genes), dividing by the mean of all elements (e.g. Pietuet. al, 1996), or using some other parameter (such as the median) that seeks to define an internal reference for thearray.

ArrayVision provides various forms of normalization, with flexible definition of background. It allows the use ofmethods appropriate to both additive and proportional error variance (proportional is typical of hybridizationarrays). ArrayVision tries to allow flexible data analysis. It also shows alterations in expression across specimens,in easily understood form. ArrayVision includes elemental display™ functions, which create easily understoodgraphics that summarize the results (e.g. up or down regulation) of complex expression studies.

At present, ArrayVision provides accurate "raw" data (hybridization intensities, hybridization intensities correctedfor background, ratios, differences). It provides statistical tests which describe how a particular array elementrelates to the distribution of all elements. Export these data, and perform further analyses in your own informaticssoftware. We are developing more sophisticated statistical procedures for high level analyses of expression data(Statistical Informatics). Statistical informatics includes more tests for determining whether a given element isdifferent from others in a single array. It also provides quality metrics for each array element, and allows us tostate (with confidence estimates) whether elements (including low expressors) alter their expression across arrays.Contact us for details.

Page 11: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

11

Summary of the ArrayVision System

• Large arrays are analyzed quickly and automatically.• Macro or micro arrays can be analyzed, using fluorescent, luminescent, or isotopic labels.• Automatic array propagation, alignment, and background correction.• Provides objective statistical methods for detection of alterations in expression.• Comparative expression analyses across two or more arrays.• Elemental displays are easy-to-understand graphics, which display hybridization parameters as color-coded

dots.

Page 12: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

12

References

Anchordoguy, T.L., Crawford, D.L. Hardewig, I. and Hand, S.C. Heterogeneity of DNA binding to membranesused in quantitative dot blots, BioTechniques 20:754-756 (1996).

Chee, M.S., Yang, R.Y., Hubbell, E., Berno, A., Huang, X.C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M.S.and Fodor, S.P.A. Accessing genetic information with high-density oligonucleotide arrays, Science 274:610-614 (1996).

DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Yan, A.S. and Trent, J.M.Use of a cDNA microarray to analyse gene expression patterns in human cancer, Nature Genetics14:457-460 (1996).

de Saizieu, A., Certa, U., Warrington, J., Gray, C., Keck, W. and Mous, J. Bacterial transcript imaging byhybridization of total RNA to oligonucleotide arrays, Nature Biotechnology 16:45-48 (1998).

Eggers, M., Hogan, M., Reich, R.K., Lamture, J., Ehrlich, D., Hollis, M., et al., A microchip for quantitativedetection of molecules utilizing luminescent and radioisotope reporter groups, Biotechniques 17:516-525(1994).

Fodor, S.P.A., Read, L.J., Pirrung, M.C., Stryer, L., Lu, A.M. and Solas, D. Light-directed, spatially addressableparallel chemical synthesis, Science 251:767-773 (1991).

Gress, T.M., Hoheisel, J.D., Lennon, G.G., Zehetner, G. and Lehrach, H. Hybridization fingerprinting of high-density cDNA library arrays with cDNA pools derived from whole tissues, Mammalian Genome 3:609-619 (1992).

Guo, Z., Guilfoyle, R.A., Thiel, A.J., Wang, R. and Smith, L.M. Direct fluorescence analysis of geneticpolymorphisms by hybridization with oligonucleotide arrays on glass supports, Nucleic Acids Research22:5456-5465 (1994).

Hennink, E.J., de Haas, R., Verwoerd, N.P. and Tanke, J.J. Evaluation of a time-resolved fluorescence microscopeusing a phosphorescent Pt-porphine model system, Cytometry 24:312-320 (1996).

Jovin, T.M. and Arndt-Jovin, D.J. Luminescence digital imaging microscopy, Annual Review of BiophysicalChemistry 18:271-308 (1989).

Khrapko, K.R., Lysov, Y.P., Khorlin, A.A., Ivanov, I.B., Yershov, G.M., Vasilenko, S.K., Florentiev, V.L. andMirzabekhov, A.D. A method for DNA sequencing by hybridization with oligonucleotide matrix, DNASequence - Journal of DNA Sequencing and Mapping 1:375-388 (1991).

Lamture, J.B., Beattie, K.L., Burke, B.E., Eggers, M.D., Ehrlich, D.J., Fowler, R., Hollis, M.A., Kosicki, B.B.,Reich, R.K., Smith, S.R., Varma, R.S. and Hogan, M.E. Direct detection of nucleic acid hybridization onthe surface of a charge coupled device, Nucleic Acids Research 22:2121-2125 1994.

Maskos, U. and Southern, E.M. Parallel analysis of oligodeoxyribonucleotide (oligonucleotide) interactions. I.Analysis of factors influencing oligonucleotide duplex formation, Nucleic Acids Research 20:1675-1678(1992).

Mason, R.S., Rampal, J.B. and Coassin, P.J. Biopolymer synthesis on polypropylene supports. I.Oligonucleotides, Analytical Biochemistry 217:306-310 (1994).

Nguyen, C., Rocha, D., Granjeaud, S., Baldit, M., Bernard, K., Naquet, P. and Jordan, B.R. Differential geneexpression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones,Genomics 29:207-216 (1995).

Oi, V.T., Glazer, A.N. and Stryer, L. Fluorescent phycobiliprotein conjugates for analyses of cells and molecules,Journal of Cell Biology 93:981-986 (1982).

Pearson, D.H. and Tonucci, R.J. Nanochannel glass replica membranes, Science 270:68-69 (1995).Pease, A.C., Solas, D., Sullivan, E.J., Cronin, M.T., Holmes, C.P. and Fodor, S.P.A. Light-generated

oligonucleotide arrays for rapid DNA sequence analysis, Proceedings of the National Academy ofSciences USA, 91:5022-5026 (1994).

Pietu, G., Alibert, O., Guichard, V., Lamy, B., Bois, F., Leroy, E., Mariage-Smason, R., Houlgatte, R., Soulare, P.and Auffray, C. Novel gene transcripts preferentially expressed in human muscles revealed byquantitative hybridization of a high density cDNA array, Genome Research 6:492-503 (1996).

Page 13: Array Vision Genomics Software - Rapid and Automated Analysis of Genome Arrays

ArrayVision

Amersham Biosciences

13

Saiki, R.K., Walsh, P.S., Levenson, C.H. and Erlich, H.A. Genetic analysis of amplified DNA with immobilizedsequence-specific oligonucleotide probes, Proceedings of the National Academy of Sciences USA,86:6230-6234 (1989).

Schena, M., Shalon, D., Davis, R.W. and Brown, P.O. Quantitative monitoring of gene expression patterns with acomplementary DNA microarray, Science 270:467-470 (1995).

Seveus, L., Väisälä, M., Syrjänen, S., Sandberg, M., Kuusisto, A., Harjo, R., Salo, J., Hemmilä, J., Kojola, H. andSoini, E.J. Time-resolved fluorescence imaging of europium chelate label in immunohistochemistry andin situ hybridization, Cytometry 13:329-338 (1992).

Shalon, D., Smith, S.J. and brown, P.O. A DNA microarray system for analyzing complex DNA samples usingtwo-color fluorescent probe hybridization, Genome Research 6:639-645 (1996).

Southern, E.M., Maskos, U. and Elder, J.K. Analyzing and comparing nucleic acid sequences by hybridization toarrays of oligonucleotides: Evaluation using experimental models, Genomics 13:1008-1017 (1992).

Tsien, R.Y and Waggoner, A. Fluorophores for confocal microscopy: Photophysics and photochemistry, In Pawley,G.P. (ed.) The Handbook of Biological Confocal Microscopy, IMR Press, pp 153-161, (1989).

Wan, J.S., Sharp, S.J., Poirier, G.M.-C., Wagaman, P.C., Chambers, J., Pyati, J., Hom, Y.-L., Galindo, J.E.,Huvar, A., Peterson, P.A., Jackson, M.R. and Erlander, M.G. Cloning differentially expressed mRNAs,Nature Biotechnology 14:1685-1691 (1996).

Wittrup, K.D., Westerman, R.J. and Desai, R. Fluorescence array detector for large-field quantitative fluorescencecytometry, Cytometry 16:206-213 (1994).

Zhao, N., Hashida, H., Takahashi, N., Misumi, Y. and Sakaki, Y. High-density cDNA filter analysis: a novelapproach for large-scale, quantitative analysis of gene expression, Gene 156:207-213 (1995).