subtracted approaches to gene expression analysis in...

61
Subtracted Approaches to Gene Expression Analysis in Atherosclerosis Stina Boräng Royal Institute of Technology Department of Biotechnology Stockholm 2004

Upload: others

Post on 21-Apr-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to GeneExpression Analysis in Atherosclerosis

Stina Boräng

Royal Institute of TechnologyDepartment of Biotechnology

Stockholm 2004

Page 2: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

© Stina Boräng

Department of BiotechnologyRoyal Institute of TechnologyAlba Nova University CenterSE-106 91 StockholmSweden

Printed 2003 atUniversitetsservice US ABBox 700 14SE-100 44 StockholmSwedenISBN 91-7283-653-9

Page 3: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng 2003. Subtracted Approaches to Gene Expression Analysis in Atherosclerosis.Department of Biotechnology, Albanova University Center, Royal Institute of Technology,Stockholm, Sweden.ISBN 91-7283-653-9

Abstract

Gene expression analysis has evolved as an extensive tool for elucidation of various biologicaland molecular events occurring in different organisms. A variety of techniques and softwaretools have been developed to enable easier and more rapid means of exploring the geneticinformation. A more effective approach than exploring the whole content of genes expressedunder certain conditions is to study fingerprint assays or to use subtracted cDNA libraries toidentify only differentially expressed genes.

The objective for the work in this thesis has been to explore differentially expressed genes inatherosclerosis. This was done by applying and modifying a protocol for the subtractiveapproach RDA (Representational Difference Analysis) in different model systems.

Initially, the molecular effects of an anti-atherosclerotic drug candidate were elucidated. Inaddition, two alternative approaches to identify differentially expressed genes obtained afteriterative rounds of RDA subtraction cycles were evaluated. This revealed that in most cases,the shotgun approach in which the obtained gene fragments are cloned without any priorselection has clear advantages compared to the more commonly used selection strategy, wherebydistinct bands are excised after gel electrophoresis.

A key process in the atherosclerotic plaque initiation is the phenotypic change of macrophagesinto foam cells, which can be triggered in a model system by using macrophages exposed tooxidised LDL. To investigate the genes expressed in this process, the RDA technique wascombined with microarray analysis, which allows for selectivity and sensitivity through RDA,as well as rapid high-throughput analysis using microarrays. The combination of thesetechniques enables significant differences in gene expression to be detected, even for weaklyexpressed genes and the results to be reliably validated in a high throughput manner.

Finally, investigation of the focal nature of atherosclerotic lesions and gene expression profilingwere studied using in vivo aortic tissues from ApoE-/- and LDLR -/- mice. The study wasbased on a comparison between localisations that are likely, and others that are unlikely, todevelop atherosclerotic plaques, and the RDA technique was employed to explore differen-tial gene expression.

© Stina Boräng, 2003

Keywords: Representational Difference Analysis, atherosclerosis, gene expression profiling

Page 4: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis
Page 5: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

I am not afraid of storms,for I am learning how to sail my ship

Louisa May Alcott

Page 6: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis
Page 7: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

LIST OF PUBLICATIONS

This thesis is based on the following manuscripts, which in the text will bereferred to by their roman numerals:

I. Boräng S., Andersson T., Thelin A., Larsson M., Odeberg J. andLundeberg J. Monitoring of the subtraction process in solid-phaserepresentational difference analysis: characterization of a candidatedrug. Gene. 2001 Jun 27;271(2):183-92.

II. Andersson T., Boräng S., Unneberg P., Wirta V., Thelin A.,Lundeberg J. and Odeberg J. Shotgun sequencing and microarrayanalysis of RDA transcripts. Gene. 2003 May 22;310:39-47.

III. Andersson T., Boräng S., Larsson M., Wirta V., Wennborg A.,Lundeberg J. and Odeberg J. Novel candidate genes for atherosclerosisare identified by representational difference analysis-based transcriptprofiling of cholesterol-loaded macrophages. Pathobiology.2001;69(6):304-14.

IV. Boräng S., Andersson T., Thelin A., Odeberg J. and Lundeberg J.Vascular gene expression in atherosclerotic plaque-prone regionsanalyzed by representational difference analysis. Pathobiology, in press.

Page 8: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis
Page 9: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Table of contents

Populärvetenskaplig sammanfattning ......................................... 11

INTRODUCTION

1 Genome discovery .................................................................................. 16

1.1 The Human Genome Project ....................................................... 172 Global analysis of gene expression ...................................................... 19

2.1 Expressed sequence tags ............................................................ 202.2 Serial analysis of gene expression ............................................ 222.3 DNA microarrays .......................................................................... 24

2.3.1 Spotted arrays (cDNA) ....................................................... 252.3.1.1 Array fabrication ....................................................... 272.3.1.2 Target preparation and hybridisation ....................... 272.3.1.3. Data analysis ........................................................... 27

3 Selective analysis of differential gene expression .............................. 293.1 Differential display and RNA arbitrarily primed PCR ................... 293.2 Suppression subtractive hybridisation ......................................... 323.3 Representational difference analysis ........................................... 34

4 Tools for gene expression sequence tag analysis .............................. 374.1 Preprocessing of sequences ....................................................... 374.2 Assembly ...................................................................................... 374.3 Annotation .................................................................................... 38

5 Tools for microarray analysis ................................................................. 395.1 Image analysis ............................................................................. 395.2 Normalization ............................................................................... 405.3 Selection of differentially expressed genes ................................. 40

PRESENT INVESTIGATION

6 Pathogenesis of atherosclerosis ........................................................... 42

7 Differential gene expression in atherosclerosis .................................. 447.1 Treatment with a therapeutic drug candidate (Paper I) ............... 447.2 Foam cell formation in atherosclerotic lesions (Papers II and III) 467.3 Focal localisation of atherosclerotic plaques (Paper IV) ............. 48

8 Signature Tag RDA ................................................................................... 519 Concluding remarks ................................................................................ 53AcknowledgementsReferences

Original Papers I-IV

Page 10: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis
Page 11: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

11

Populärvetenskaplig sammanfattningDen här avhandlingen behandlar frågor inom området bioteknik. Medbioteknik kan man med hjälp av mikroorganismer, växtceller, djurcellereller celler från människan framställa produkter eller utveckla processersom människan kan ha nytta av. Till exempel är det möjligt att med hjälpav bioteknik framställa kemikalier, livsmedel, läkemedel och vacciner.Alla levande organismer är uppbyggda av celler. De flesta är flercelliga,men det finns också bakterier, svampar och alger som bara har en enda cell.Ordet cell betyder egentligen rum,vilket kan liknas vid att cellen har encellvägg som omsluter en rad olikaorganeller. Organellerna har olikafunktioner i cellen, precis sommöblerna i ett rum där en del har tilluppgift att vara bekväma, andrasnygga osv. Den viktigaste organellenär cellkärnan som fungerar somsambandscentral för alla de pro-cesser som sker inuti cellen. I cell-kärnan finns organismens arvsanlag,d v s dess gener, vilka är uppbyggdaav DNA. Det finns även organismersom inte har någon cellkärna, sombakterier där allt DNA istället liggerfritt inuti cellen. 1943 kom OswaldAvery på att en cells DNA är det sominnehåller all genetisk information ochtio år senare upptäckte Francis Crickoch James Watson strukturen för DNA,vilket de också fick Nobelpriset för.En DNA-molekyl är uppbyggd av fyra olika beståndsdelar som kallas förnukleotider. Dessa brukar förkortas A, C, G respektive T och den inbördesordningen (sekvensen) av dessa kodar för vad genen har för funktion.Ordningen på nukleotiderna bestämmer nämligen vilket protein som ska

DNA

mRNA

Cellkärna

Protein

tRNAAminosyror

Ribosom

Figur 1. Schematisk bild över hur DNAtranskriberas till mRNA som vid riboso-merna styr sammanfogningen avaminosyror till proteiner.

Page 12: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

12

bildas av just denna gen (se Figur 1). Stora delar av människans DNAinnehåller emellertid sekvenser som inte kodar för något protein alls. Dentotala mängden sekvens, vare sig den kodar för ett protein eller inte, brukartillsammans kallas för organismens genom. Genom olika metoder kannukleotidernas ordningsföljd bestämmas. Detta kallas med ett annat ord DNAsekvensering. Att sekvensera hela genomet för en organism bidrar till enökad kunskap om den genetiska bakgrunden till en mängd olika egenskaperoch funktioner hos den organismen. En rad olika organismers genom harsekvenserats och till de mest kända forskningsprojekten inom detta områdehör det internationella samarbetet som kallats ”the Human Genome Project”.Denna enorma satsning ledde till att människans totala genom nu är sekvenseratoch slutresultatet blev tillgängligt i april 2003.

Arbetet som ligger till grund för denna avhandling bygger till stor del på enmolekyl som kallas mRNA. Denna bildas som ett mellanled i syntesen avproteiner från DNA (se Figur 1). Genom att studera mRNA kan man i mångahänseenden lättare identifiera en gen än om man skulle detektera och följadet färdiga proteinet som bildas av genen. mRNA är emellertid en någotinstabil molekyl och man omvandlar den därför ofta till en komplementärDNA-molekyl (cDNA), eftersom DNA är stabilare och lättare att arbeta medän RNA. Alla gener i en cell är inte aktiva hela tiden utan väntar på en signalför att aktiveras. Det är alltså bara de gener som är aktiva under rådandeomständigheter som omvandlas till mRNA och vidare till proteiner. Det finnsidag en mängd olika metoder för att ta reda på vilka gener som är aktiva(gene expression) och avhandlingen beskriver en rad sådana. Man kan jäm-föra celler för att studera vilka gener som är aktiva på olika ställen inomsamma organism eller mellan individer som behandlats olika. Det finns ävenett antal metoder som koncentrerar sig enbart på att hitta skillnader mellanolika celler. Ett exempel på en sådan metod förkortas RDA och just dennametod har använts i alla artiklar som denna avhandling bygger på.

Samtliga fyra artiklar handlar om att hitta skillnader i genexpression i modell-system som involverar gener påverkade av åderförkalkning. I den förstaartikeln jämförs celler som behandlats med en ny medicinkandidat motåderförkalkning med obehandlade celler för att försöka ta reda på exakt hur

Page 13: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

13

medicinkandidaten reglerar de cellulära mekanismerna, d v s vilka gener somaktiveras eller inaktiveras. I artikel två och tre jämförs celler som utvecklatsolika långt i åderförkalkningsprocessen. Härigenom kan man få ledtrådarom vilka gener som påverkar denna utveckling och därigenom möjligenförstå hur man skulle kunna hindra förloppet av åderförkalkning. Denna vanligaoch allvarliga sjukdom har en tendens att uppstå främst där blodkärlen böjereller delar på sig. För att försöka ta reda på varför det är så genomfördes detsista arbetet där celler från blodkärlens krökar och förgreningar jämfördesmed raka blodkärl för att studera skillnader i genexpression.

Page 14: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

14

Page 15: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

15

Introduction

Page 16: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

16

1 Genome discoveryFrancis Collins, the director of the National Human Genome ResearchInstitute (NHGRI), perhaps best described the essential properties of thecomplex human genome in 2001:“It’s a history book – a narrative of the journey of our species through time.It’s a shop manual, with an incredibly detailed blueprint for building everyhuman cell. And it’s a transformative textbook of medicine, with insights thatwill give health care providers immense new powers to treat, prevent andcure disease.”In other words, the genome of any organism contains all the informationrequired to understand its physiological nature, development and evolutionaryhistory. Today, several genomes of different organisms have been determined,and the task for researchers now is to decipher their transcribed genes, ortranscriptome, and draw correlations with the complex correspondingprotein network, the proteome. Exploration and elucidation of these intricatefeatures of every cell is most commonly known nowadays as functionalgenomics.

Technologies used in molecular biology have constantly evolved and improved.During the past 25 years, two landmark technologies within the field havebeen developed. First, two independent methods for DNA sequencing wereinvented, one by Allan Maxam and Walter Gilbert, the other by Fred Sangerand coworkers (Maxam and Gilbert 1977; Sanger et al. 1977). Second, KaryMullis devised the polymerase chain reaction (PCR) technique, enabling therapid multiplication of DNA fragments (Mullis et al. 1986). Like DNAsequencing, PCR has completely revolutionised molecular genetics, enablinga whole new approach for the study and analysis of genes.

An intense collective effort is underway to map the genome of variousorganisms, for a number of reasons. For instance, detection of genes (or regionsof genes) that are seldom affected by mutations or other changes (conserveddomains) may provide new insights into gene function. On the other hand,single-base sequence variations occurring at specific locations in the genome,single nucleotide polymorphisms (SNPs), occur throughout the whole genome

Page 17: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

17

(Brookes 1999), and genotyping, i.e. comparison of the SNPs between individu-als within the same species, can increase our knowledge, for instance,genetically related diseases. Sequencing of the human genome along withother organisms, from yeast to chimpanzees, has given rise to a growingbiological research field called comparative genomics. Although all organismsappear to be different and behave in various ways, all of their genomes arecomposed of DNA. Comparative genomics offers researchers possibilities topinpoint the signals that control gene function, and thus new approaches fortreating diseases. In addition, the identification of regions of similarity anddifferences among species may facilitate understanding of the structure andfunction of genes, and to address questions such as why chimpanzees do notsuffer from some of the severe diseases that affect humans, such as HIV,although human and chimp DNA sequences are estimated to be 98.8 %identical. Data on the genomes of more than 800 organisms, representingboth completely sequenced organisms and organisms for which sequencingis still in progress, can be found at http://www.ncbi.nlm.nih.gov.

1.1 The Human Genome Project

The human genome includes approximately three billion base pairs, packagedin the 23 pairs of chromosomes. Fifty years after James Watson and FrancisCrick proposed a double helical structure of DNA in 1953 (Watson and Crick1953), the complete sequence of the human genome is now available. Toobtain this sequence, an international, collaborative research effort was initiatedin 1990, called the Human Genome Project (HGP). The goal of this effortwas to create a public database containing genetic information as a resourcefor scientific discovery within a time limit of 15 years. In February 2001,HGP published a draft version of the human genome sequence (Lander et al.2001). Simultaneously, another research group led by Craig Venter of CeleraGenomics published their less widely available draft version of the humangenome sequence (Venter et al. 2001).

More than two years ahead of schedule and for much lower cost than originallyestimated, HGP announced that the sequence was finished, and published it,

Page 18: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

18

in April 2003. The HGP defines a finished sequence as being highly accurate(with fewer than one error per 10,000 letters) and highly contiguous (theonly remaining gaps corresponding to regions where the sequences cannotbe reliably resolved with current technology). From the sequence, the humangenome is estimated to contain between 30,000 and 40,000 genes, incontrast to earlier estimates ranging from 50,000 to 140,000 genes (Liang etal. 2000). About 99 % of the gene-containing parts of the human genome arecovered, with an accuracy of 99.99 %. The major findings from HGP’senormous efforts are that genes account for a relatively small amount of thehuman genome and the architecture of human proteins is very complexcompared to that of other species. Since the genomic sequence for eukaryotesonly contains a small portion of coding regions (around 2 % in the humangenome (Lander et al. 2001)), intensive efforts are required to identify andsequence regions with altered expression levels using genomic sequencing.The full sequence of the human genome provides a huge source of informa-tion about the structure, organisation and function of the human genes. AsNobel laureate James D. Watson stated:

“The completion of the Human Genome Project is a truly momentous occasionfor every human being around the globe.”

Page 19: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

19

2 Global analysis of gene expression

Obviously, even before the human genome was completed, many researchgroups strove to identify genes involved in a wide range of cellularprocesses. In many cases, exploring genes that are only active undercircumstances of specific interest would immensely simplify this procedure.This can be done by investigating the genetic information available at theRNA rather than the DNA level. Through splicing events, all introns andoccasionally exons are removed from the transcript. The processed mRNA isthen translated into proteins. The function of the proteins produced can bedramatically altered through the production of different splice variants of thetranscribed genes (Graveley 2001).

The analysis of gene expression in specific tissues and physiologicalprocesses has rapidly developed over the last twenty years, and it is nowpotentially possible to identify all of the genes expressed in a specific tissue.The introduction of PCR together with other technological improvements(such as microarrays) has simplified the discovery of differentially expressedgenes.

The fundamental principle for monitoring gene expression in tissues involvesextraction of total RNA followed by isolation of the mRNA fraction andreverse transcription into cDNA, that can be cloned into bacterial plasmids,resulting in a cDNA library, and the whole spectrum of mRNAs are represented.Collectively, all the cDNAs in a library represent the frequency of expressionof different genes, i.e. the most abundantly expressed genes will generate themost abundant copies of individual clones. It has been estimated that 20 % ofthe total mRNA population in the human genome are represented by lessthan 100 different transcripts in a cell (Gibson and Muse 2002). To identifyless abundant genes, libraries can be normalized (Soares et al. 1994; Bonaldoet al. 1996). That is, the most abundant genes can be subtracted and discardedthrough reassociation kinetics, theoretically leaving only the weakly expressedtranscripts or single copies of each gene, depending on the normalizationstrategy chosen.

Page 20: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

20

2.1 Expressed sequence tags

One of the most widely used approaches for gene identification and geneexpression profiles in various tissues, cell types, or developmental stages, isto generate expressed sequence tags (ESTs) (Adams et al. 1991). Briefly, anEST is part of a sequence from a cDNA clone that corresponds to an mRNA.When constructing an EST library, it is often beneficial to utilize the polyAregion of the mRNA. Usually, a restriction site (typically NotI) is introducedtogether with a polyT oligonucleotide to prime the first strand synthesis ofcDNA. However, extension of the first strand of cDNA is often incompletedue to the presence if inhibitory secondary structures, so fragments of variouslengths are produced. Different approaches to generate full-length clones areavailable, (see, for instance (Carninci and Hayashizaki 1999; Das et al. 2001;Suzuki and Sugano 2001)). Following second strand synthesis, adaptors(usually with an EcoRI overhang) are ligated to both ends. Enzyme restrictionwith NotI then enables directional cloning into a suitable phage or plasmid.In most cases, sequencing of the clones is performed from the 5’-end to avoidproblems reading through the polyadenylated 3’-region. To circumvent directedcloning of the 3’-end regions, random hexamers can be utilized in the cDNAsynthesis (Dudley et al. 1978; Dias Neto et al. 2000), although this approachis not widely employed. In theory, the complete transcribed region, exceptfor the outermost ends, will then be represented.

By choosing clones for sequencing in a randomised manner, it is possible toconstruct a profile of the transcriptional activity. Generally, an EST library issequenced until the yield of novel clones is reduced to less than 10-20 %.The possibility of using normalized libraries could be advantageous in thisrespect. Once the clones have been sequenced, they are made available throughthe IMAGE Consortium (http://image.llnl.gov), and the sequences aredeposited in electronic databases. The National Center for BiotechnologyInformation (NCBI) (http://www.ncbi.nlm.nih.gov) maintains a repositoryof ESTs. Table 1 displays the number of public EST entries at this date(November, 2003) for a selection of organisms (http://www.ncbi.nlm.nih.gov/dbEST).

Page 21: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

21

Table 1. Number of public EST entries for a selection of organisms.

Genome projects including various model organisms have taken advantageof EST studies because of its suitability for the discovery of new genes, physicalmapping of genomes, and identification of coding regions in genomicsequences (Adams et al. 1991). Also, comparative EST analysis provides avaluable resource for various biological research fields. For example, it allowsevaluation of gene expression patterns in response to different biologicalsignals, thereby enhancing the understanding of cellular biology and physiology(Lee et al. 1995).

Page 22: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

22

2.2 Serial analysis of gene expression

In 1995 a new, more rapid means of tag sequencing (Serial Analysis of GeneExpression, SAGE) was described with the potential to significantly increasethroughput capacity (Velculescu et al. 1995). In contrast to the EST approach,SAGE allows the sequencing of multiple tags within a single clone, therebyreducing both time and sequencing costs. The major application of SAGEwas expected to be comparison of gene expression patterns in differentdevelopmental and disease states, but today it is also used in a variety ofapplications to study functional genomics in different organisms.

Figure 2 shows the principle of SAGE, in which cDNA is reversely transcribedfrom mRNA using a biotinylated oligo-dT primer. The cDNA is then cleavedwith a restriction endonuclease (anchoring enzyme) and the 3’-terminal cDNAfragments are bound to streptavidin-coated beads. The captured cDNA isdivided into two aliquots and ligated to one of two oligonucleotide linkerscontaining a recognition site for a tagging enzyme, which belongs to theclass IIS restriction endonucleases and hence cleaves DNA at a specific distance3’ to the recognition site. Cleavage with the tagging enzyme will yield short(~9-14 bp) tags of cDNA that can be ligated to each other, forming ditags.Ligated tags are used as templates for PCR amplification with linker-specificprimers, and the PCR products are cleaved by the anchoring enzyme,concatemerized into long continuous stretches of DNA, cloned into a plasmidvector then sequenced. This allows for high-throughput sequencing of up to50 tags per sequence run.

SAGE analysis has a number of unique advantages over other techniques forglobal gene expression analysis (Velculescu et al. 1997). The rapid, high-throughput sequencing and analysis of tags generates reliable expression profi-les and enables the discovery of rare and novel gene transcripts since SAGEtheoretically generates a tag for every cellular mRNA. Among the negativeaspects of the method are limitations due to the short tag length generated.This problem may lead to failure of a tag to match and uniquely identifysequences in SAGE reference databases, especially if it is situated in a

Page 23: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

23

AAAATTTT

Cleave with anchoring enzyme (AE)Bind to streptavidin beads

AAAATTTTAE

Divide in halfLigate to linkers (A + B)

AAAATTTT

AETEAAAAATTTT

AETEB

Cleave with tagging enzyme (TE)Blunt end

AETEA

Tag

AETEB

Tag

Ligate and amplify withprimers A and B

AETEA AE TE B

Ditag

Cleave with anchoring enzymeIsolate ditagsConcatenate and clone

AE AE AE

SERIAL ANALYSIS OF GENE EXPRESSION(SAGE)

Figure 2. Schematic of the SAGE procedure.

Page 24: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

24

conserved region (Ishii et al. 2000; Kannbley et al. 2003). Other disadv-antages are that some of the cDNAs may lack the restriction site used toconstruct the SAGE library, and the technique provides no information aboutsplice variants. Although SAGE has been widely adopted for global analysisof gene expression, its applicability is limited by the large amount of RNArequired. Many research groups have tried to overcome this problem by usingeither PCR amplification of starting cDNA materials, as in SAGE-lite(Peters et al. 1999) and PCR-SAGE (Neilson et al. 2000), or PCRreamplification of SAGE ditags, as in microSAGE (Datson et al. 1999) andSADE (Virlon et al. 1999). These methods all include an additionalamplification step, which may introduce bias in quantitative analysis of geneexpression. In the year 2000, miniSAGE was introduced as a modified SAGEprotocol that does not require any additional amplification (Ye et al. 2000).This technique allows for gene expression profiling using only 1 µg totalRNA, and is still considered to require less starting material than any otherSAGE technique. Despite its drawbacks, SAGE is a general and powerfultechnique allowing not only global gene expression profiling of variouseukaryotic organisms, but also the identification of genes that are exclusivelyexpressed under various cellular conditions (Yamamoto et al. 2001).

2.3 DNA microarrays

Since the DNA microarray technique was first described in 1995 (Schenaet al. 1995) it has been extensively employed for large-scale analysis of geneexpression in the field of functional genomics. This high-capacity systemcan be used to measure the relative quantities of specific mRNAs representingtens of thousands of genes in two or more tissue samples in a singleexperiment. Three types of microarray can be constructed, depending on thesource of the immobilized DNA. In this thesis, only the basic methodologyconcerning spotted microarrays (cDNA or longmers) will be discussed,although both genomic DNA (Forozan et al. 1997) and in situ synthesisedoligonucleotides are being used as probes attached to microarray surfaces.An important example of the latter is the Affymetrix platform (Lockhart et

Page 25: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

25

al. 1996), where oligonucleotides are synthesised directly on the surface byphotolithography and solid-phase chemistry (Fodor et al. 1993) to produceprobes, 20-25 oligonucleotides in length. Multiple probe pairs (one perfect-match oligonucleotide is paired with a mismatch oligonucleotide) fordifferent regions of each gene are designed, allowing for mean values ofsignal intensities to be calculated.

2.3.1 Spotted arrays (cDNA)

The use of cDNA microarrays to examine numerous genes in parallel iscurrently one of the most common approaches for gene expression profiling.cDNA fragments are first amplified and sequenced before being spotted ontoa suitable surface, generally glass microscope slides, at high density. Relativeexpression levels of genes represented in the array can be analysed bycomparing fluorescence intensities between two fluorescently labelled sampleshybridised to the array. Normalization of the collected data is essential foradjustment of fluorescence intensity ratios, and should not be neglected. Variousauthors, including Priti Hegde and colleagues (Hegde et al. 2000), haveoptimised several steps involved in the process of constructing reliable andreproducible array platforms. The general procedure of cDNA microarrayexperiments is outlined in Figure 3. Recently, a single long probe (longmer,consisting of 50-70 oligonucleotides) for each gene was introduced as analternative to the well-established methods using cDNA or in situ synthesizedoligonucleotides attached to the arrays (Shoemaker and Linsley 2002).Longmer arrays can be fabricated and analysed in roughly the same standardmanner as for spotted cDNA arrays. In addition, the longmer strategy savestime since no amplification of cDNA clones is required, and comparisonwith in situ synthesized 25-mer probes has produced promising results (Barczaket al. 2003).

Page 26: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

26

Sample 1 Sample 2

mRNA

Labelled cDNA

Reverse transcriptionand labelling

Hybridisation to surface with printed probes

Figure 3. cDNA microarray flowchart.

Page 27: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

27

2.3.1.1 Array fabricationWhen creating an array, either cDNAs representing all known genes for theorganism studied or a subset of clones representing only the genes of interestto the particular study can be used. In either case, amplified PCR productsare spotted onto a slide by a high-speed robotic system, and the array isfurther processed to attach the DNA sequences to the surface and denaturethem. Concurrently, both positive and negative control gene sequences shouldbe printed onto the slide to validate the data generated. To avoid intra-slidevariations, cDNA clones are generally spotted with duplicates (or triplicates)spread over the slide surface. Even the slightest changes in the micro-environment, such as modifications of the slide surface, spotting buffer,temperature, and relative humidity, may affect the quality of the spotted genefragments and hybridisation strength (Lander 1999).

2.3.1.2 Target preparation and hybridisationRNA from two different sources is used for reverse transcription into singlestranded cDNA in the presence of nucleotides labelled with two differentfluorescent dyes (typically Cy3 and Cy5), one for each sample. The labelledreaction products are purified, mixed, and hybridised to the array surface,allowing the differentially labelled cDNAs to bind the corresponding nucleicacid molecules spotted onto the surface in a competitive manner. However,prior to hybridisation of fluorescently labelled cDNAs to the array, dependingon the slide used, its surface may need blocking or inactivation of activemolecules coated on the surface to reduce background signals.

2.3.1.3 Data analysisHigh-resolution confocal fluorescence scanning of the array provides dataon the relative signal intensities and ratios between the samples for the genesrepresented on the microarray. This allows relative expression levels to beestimated, and differentially expressed genes to be determined. A number offactors, such as RNA quantity and quality, labelling efficiency, and detectionof intensity signals from the different laser wavelengths, all affect the ratiosobtained. Therefore, normalization of the data obtained is of utmost importanceand a number of different strategies have been developed and utilized for thispurpose (Schuchhardt et al. 2000; Kerr and Churchill 2001; Yang et al. 2002).

Page 28: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

28

Page 29: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

29

3 Selective analysis of differential geneexpression

The identification of differentially expressed mRNAs has been used to helpunderstand not only gene function, but also the underlying molecularmechanisms of particular biological systems. A more effective approach thanexploring the whole content of genes expressed under certain conditions is tostudy fingerprint assays or to use subtracted cDNA libraries to identify onlydifferentially expressed genes. This can heavily reduce the time, money andeffort involved. To do this, a variety of selective techniques have beendeveloped, and some of the most frequently used techniques are describedbelow.

3.1 Differential display andRNA arbitrarily primed PCR

To meet the needs for isolating and identifying genes that are differentiallyexpressed in various cells and conditions, Peng Liang and Arthur Pardeedeveloped a new technology called differential display (DD) in 1991 (Liangand Pardee 1992). Briefly, the method is based on primer-sets in which thefirst primer is anchored to mRNA in the polyadenylated region of the 3’-end,while the other is anchored with arbitrary spacing upstream from the first.This yields a subpopulation of mRNAs, which can be reversibly transcribedinto cDNA, amplified, and resolved on a polyacrylamide gel. Differentiallyexpressed genes between two or more samples can then be detected in parallel,and further explored. The principles of DD are schematically illustrated inFigure 4A.

Page 30: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

30

A. B.DIFFERENTIAL DISPLAY(DD)

RNA ARBITRARILY PRIMED PCR(RAP - PCR)

AAAAAAAAA5´ CAPNVTTTTTTTTTT

AAAAAAAAA5´ CAP

AAAAAAAAA5´ CAP

AAAAAAAAA5´ CAPNVTTTTTTTTTT

NVTTTTTTTTTT

NVTTTTTTTTTT

Sample 1 Sample 2 Sample 1 Sample 2

First strand cDNA synthesis

Second strand cDNA synthesis

PCR cycling

Comparativegel electrophoresis

Figure 4. Overview of two fingerprint assays, (A) differential display (B) RNAarbitrarily primed PCR.

Page 31: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

31

Using an oligo-dT primer with two additional nucleotides at the 3’-end (5’-oligo dT-VN-3’, where V = A, C or G; and N = A, C, G or T) will generate asubpopulation corresponding to 1/12 of the total mRNA population. Suchprimers permit the initiation of reverse transcription of only this arbitrary setof mRNAs. As a 5’ primer, a short oligonucleotide with an arbitrary basesequence is used. After amplification, this will yield products of varioussequence lengths corresponding to different mRNAs. The gene fragmentsfrom each sample can then be visualized and separated by gel electrophoresis.Differentially expressed genes can then be further explored by excision,cloning, and sequencing of bands that differ between samples. Using othersets of primers will obviously generate a different subpopulation of genefragments. Therefore, repeated experiments with other primers are requiredto cover the complete gene expression profile.

Over the years, criticism over the high number of false positives produced byDD, for example Dana Crawford et al (Crawford et al. 2002) has been raised,who also discuss the statistically predicted need for 240 different sets ofprimers to cover all mRNAs in a cell. However, intense efforts have beenmade by a number of research groups to resolve the problems and to refineand improve DD from a technological perspective (Liang 2002; Stein andLiang 2002).

An additional technique, the closely related RNA arbitrarily primed PCR(RAP-PCR) protocol, was developed by John Welsh and colleagues (Welshet al. 1992) (Figure 4B). The only real difference in this approach comparedto DD, is that primers with arbitrary oligonucleotides are used for both firstand second strand cDNA synthesis, enabling a subpopulation of gene fragmentsspread throughout the genes, rather than just the 3’-ends, to be generated.This also enables analysis of non-RNA species.

Page 32: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

32

3.2 Suppression subtractive hybridisation

A number of techniques have been developed to study differential geneexpression between two different sources. One such method is to generatesubtracted cDNA libraries. Suppression subtractive hybridisation (SSH),published in 1996 by Luda Diatchenko et al (Diatchenko et al. 1996), is aPCR-based cDNA subtraction method that combines normalization offragments with both high and low abundance, and subtraction of gene fragmentspresent in the two cDNA populations. A schematic view of the SSH procedureis outlined in Figure 5.

One cDNA population containing differentially expressed gene fragments ofinterest is termed the “tester” population, while the other is termed the“driver”. First, both tester and driver cDNAs are digested by restrictionenzymes, and the tester is subdivided into two batches that are equal in allrespects. Different sets of linkers containing long, inverted terminal repeats(Lukyanov et al. 1995) are ligated onto the cDNAs in the two batches, resultingin two tester populations. Tester and excess driver are mixed, heat-denaturated,and annealed in a first hybridisation, leading to normalization, i.e. equalizationof the abundance of the cDNAs in the tester. A second hybridisation step isthen performed with a mixture of both tester populations as well as newdriver to allow for the possibility that ds DNA may be formed, originatingfrom both tester populations. These fragments with different linkers at the3’- and 5’-ends are favoured in the following PCR amplification by using apair of primers corresponding to the outer part of the two linkers. Hybridisationproducts consisting of tester/tester duplexes with the same linker (i.e. thesame long, inverted terminal repeats) at the ends, will form stable hairpin-like structures after each denaturation-annealing PCR step, preventing linker-specific primers from annealing. In this manner, only rare target fragmentsare enriched, although the number of false positives obtained using the SSHmethod is relatively high.

Page 33: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

33

SUPPRESSION SUBTRACTIVE HYBRIDISATION(SSH)

First hybridisation

Sample 1the TESTER with

linker 1

Sample 1the TESTER with

linker 2

Sample 2the DRIVER

Second hybridisationMix samples

Add new DRIVER

One "new" product

Fill in ends and add primers for amplification

and

Exponential amplification(No amplification)

Figure 5. Scheme of the SSH method.

Page 34: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

34

3.3 Representational difference analysis

Representational difference analysis (RDA) is a PCR-coupled subtractiveenrichment procedure originally developed for detecting differences betweentwo genomes (Lisitsyn and Wigler 1993). To avoid the complexity of usinggenomic DNA as starting material, Michael Hubank and colleagues (Hubankand Schatz 1994) adopted the protocol for use with differentially expressedgenes. The method relies on restriction digestion of cDNA and ligation ofadapters to a PCR-amplified subset of all gene sequences, and is outlined inFigure 6.

Endonuclease restriction of the cDNA, followed by linker ligation and PCRamplification, will generate a representation of the transcribed genesoriginating from the mRNA population. The cDNA representation from whichunique sequences are sought is designated the “tester”, and the cDNArepresentation that is used to subtract sequences common to the two populationsis designated the “driver”. To enrich differentially expressed genes, the linkersare removed by restriction digestion and another set of linkers is ligated tothe tester fragments. The tester and driver are subjected to cross-hybridisationwith excess amount of driver to “drive out” fragments that are also present inthe tester, leaving three variants of hybridisation products: driver-specificfragments with no linkers, fragments common to both tester and driver withjust one linker, and tester-specific fragments containing linkers in both5’- and 3’-ends allowing for exponential amplification with linker-specificprimers. The resulting pool of gene fragments after PCR amplification isdenoted the first difference product (DP). Further rounds of linker removaland ligation of new sets of linkers, cross-hybridisations with more stringentratios, and PCR amplifications are required to enrich differentially expressedgenes with a low background of false positives. Both upregulated anddownregulated genes in a model system can be identified in two parallelexperiments by interchanging the tester and driver sources.

Page 35: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

35

REPRESENTATIONAL DIFFERENCE ANALYSIS(RDA)

Sample 1 - the TESTER Sample 2 - the DRIVER

AAAAAAAAATTTTTTTTTT

AAAAAAAAATTTTTTTTTT

Double stranded cDNA

Fragmentation with restriction enzyme

Linker ligation and PCR amplification

Linker cleavage

New linker ligation on the tester

Subtractive hybridisation and PCR amplification

Linear amplification Exponential amplification No amplification

Repeated rounds of subtraction and amplification

Figure 6. Schematic diagram of cDNA RDA.

Page 36: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

36

In contrast to other methods, RDA makes it possible to rapidly reduce thenumber of genes represented to a few of the most potentially interesting byeliminating fragments present in roughly equal proportions in the twopopulations and leaving only those that are differentially expressed. Thus,reductions are achieved in the number of sequencing and data analysis steps,and consequent reductions in the time involved. However, there are somedisadvantages. There is a high likelihood that two restriction enzyme restrictionsites will be present in a mRNA of average lenght, and thus the interveninggene sequence will be amplified (although it should be noted that each of thesequences will not be equivalently amplified, so there will be sequence biasduring the multiple rounds of PCR amplification, resulting in loss of somePCR products). Furthermore, some mRNAs may only harbour on site for therestriction enzyme selected, resulting in loss of that particular genefragment. One way to overcome this problem would be to repeat the RDAprotocol on the same cDNA populations using a different restriction enzyme,and comparing the results. RDA is not the method of choice when manydifferences are expected between two samples. Under such circumstances,RDA will probably enrich the fragments that are most efficiently amplified,and not necessarily the differentially expressed fragments of interest (Hubankand Schatz 1999).

In recent years, many research groups have optimised and modified the RDAprotocol in diverse ways (most of which are not discussed in this thesis). Forexample, Jacob Odeberg and colleagues (Odeberg et al. 2000) further developedthe RDA protocol taking advantage of solid-phase technology to simplifyremoval of digested linkers and uncleaved fragments, and to enable lowamounts of starting material to be used. This modified protocol was used asa basis for the work presented in this thesis.

Page 37: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

37

4 Tools for gene expressionsequence tag analysis

Regardless of the strategy chosen for gene expression profiling, it willgenerate a massive amount of sequence data. To facilitate management andanalysis of the data obtained, powerful computational resources are required.The process for data analysis of gene sequences obtained from EST sequencingefforts or shotgun sequencing of subtracted approaches, can be broadly dividedinto three steps: pre-processing of the sequences, assembly, and annotation.For each of the three stages, a wide range of software tools have been developed,but here, only the main principles are described.

4.1 Preprocessing of sequences

Raw data from sequencing instruments needs to be passed through severalprocesses before being entered into a subsequent assembly program. Theseprocesses include screening for the vector sequence, quality evaluation, andconversion of data formats. Manual editing of each sequence in a graphicaluser interface can have a powerful impact on the resulting sequences, althoughit is very time consuming, especially for large-scale sequencing projects.Instead, batches of sequences can be passed through the system in an automaticway. Neglecting the pre-processing step will generate sequences with poorquality, possibly leading to incorrect annotation of genes.

4.2 Assembly

In any sequencing project, the goal is to assemble all sequences with homologygreater than a suitable threshold value into one cluster. This process involvescomparison of sequences, finding overlapping regions, and integrating thosesatisfying pre-set computational criteria. At least two different approaches toachieve this have been developed. In the first, every new sequence entering

Page 38: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

38

the assembly program is compared with sequences that have already beenintegrated. If a sufficient match value is reached, the sequences are mergedinto one contig and a consensus sequence is created, i.e. a single representativesequence for the cluster. In the second, all sequences are compared to eachother simultaneously, and the best matches of sequences will be joined first.The second approach is generally preferred, although it demands much highercomputational power.

Since sequences originating from the same gene family may be difficult todistinguish from each other, problems in assembly may also arise. Inaddition, different splice variants as well as chimeric clones, in which two ormore gene fragments are brought together before ligation into a vector, maycause problems in the assembling procedure.

4.3 Annotation

When consensus sequences for all gene fragments obtained in a project havebeen determined, the next step is to find out which genes they represent.Numerous software tools have been developed for this purpose, one of themost common being BLAST (Basic Local Alignment Search Tool) (Altschulet al. 1990) for sequence-to-sequence comparisons against a suitable database.Instead of using consensus sequences, all sequences can be individually usedfor homology searches, and thus expression profiles can be built up.

Page 39: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

39

5 Tools for microarray analysisThe microarray technology has rapidly evolved, and is now employed in analmost standardised manner, with all reagents, printing robots and scannerscommercially available. Analysis of the data obtained is, however, constantlybeing refined. The analysis of spotted microarray data can be broadly dividedinto three steps: image analysis, normalization, and selection of differentiallyexpressed genes.

5.1 Image analysis

To evaluate the data obtained from microarray experiments, the array isphysically scanned to create a digital image of the red and green fluorescenceemissions (Cy5 and Cy3 respectively) from the array. Overlaying the outputimages of the Cy5 and Cy3 channels reveals physical information, such asspot morphology, hybridisation uniformity, and background artefacts suchas dust particles. In addition, overlay images provide rough estimations ofdifferentially expressed genes. After scanning, each spot must be located andlinked to a clone ID. The primary purpose of the image analysis step is tocalculate a foreground and a background intensity value for each spot, enablingadjustments for local variations in the array. Furthermore, the intensity valuescan flag for unreliable spots. The oldest method for spot intensity valuedetermination is the histogram method (Chen et al. 1997), where a histogramis formed from the intensities of the pixels within a mask covering the spottedsurface. Pixels are defined as foreground if their value is greater than apre-set threshold, otherwise they are defined as background. Other strategiesrely on finding spots as joined groups of foreground pixels, by fitting a circleof constant diameter to all spots in the array, or by allowing the circle’sdiameter to change for each spot. Using these methods, the background valuesmust be determined separately. One way of doing this is to consider all pixelsoutside the spots, but inside the bounding box, as local background. Oncethe intensity values have been estimated, the most common procedure is tosubtract the background intensity from the foreground for each spot.

Page 40: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

40

5.2 Normalization

The purpose of normalization is to adjust the individual hybridisation intensitiesso that relevant biological information can be obtained. Most normalizationalgorithms can be applied either to the whole array (globally) or to a subsetof genes represented in the array (locally). Common factors to introduce red-green bias in a spotted microarray experiment are those related to labellingefficiencies and scanning properties. In addition, variations in differentpositions of the spotted area may occur, or even between different slides.Hence, normalization of the data obtained must be performed prior to anycalculations of relative expression levels for the genes analysed, enablingfurther explorations of biologically relevant expression patterns. There aremany approaches for normalization of spotted microarray data, some of thosereviewed by John Quackenbush (Quackenbush 2002) and Gordon Smyth andcolleagues (Smyth et al. 2003).

5.3 Selection of differentially expressed genes

The data obtained from microarray experiments are often used to screen fordifferentially expressed genes between one or more sample pairs. The gene-ral procedure is to choose a statistical method for ranking genes from high tolow evidence of differential expression, followed by choosing a cut-off valueabove which the genes are determined to be significantly expressed (as reviewedin (Smyth et al. 2003)). When differentially expressed genes have beenidentified, a common approach is to group genes with similar expressionprofiles into clusters (Eisen et al. 1998), potentially revealing co-regulatedgenes with correlations not detected otherwise.

Page 41: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

41

Present investigation

Page 42: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

42

6 Pathogenesis of atherosclerosis

Atherosclerosis is an inflammatory disease that is believed to be theprincipal cause of death in modern society. Large and medium-sized bloodvessels are mainly affected, and the atherosclerotic lesions tend to occur inregions with turbulent blood flow, such as branches, bifurcations and curvedsections (Davies 1997; Ross 1999). One of many risk factors associated withatherosclerosis is elevated cholesterol levels in the vascular system. Undernormal circumstances, cholesterol and its derivates function as membranelipids or are stored as lipid droplets in cells for later use. The major carriersof blood cholesterol, low density lipoproteins (LDLs), constantly circulatingin the vascular system and are associated with the buildup of cholesterol in

Figure 7. Diagramatic representation of monocyte migration, differentiation, and foam cellformation in atherosclerosis. (1) Monocyte chemotaxis. (2) Cell adhesion of monocytes tovascular endothelial cells. (3) Transmigration. (4) Differentiation of monocytes intomacrophages. (5) Macrophage proliferation. (6) Expression of scavenger receptors.(7) Transformation of macrophages into foam cells. (8) Apoptosis. Image kindly providedby Med Electron Microsc (2002) 35:180 (Fig. 1). © Springer-Verlag

Page 43: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

43

atherosclerotic plaques. Endothelial cells that line arteries transport LDLsinto vessel walls (Kruth 2001), in both atherosclerotic and non-atheroscleroticstates. Small and dense LDL particles have a high affinity for matrix proteins,like collagens, that are expressed following injury to endothelial cells(Heeneman et al. 2003) and are therefore easily trapped upon entering thearterial wall, causing diffuse arterial intimal thickening and progression intofatty streak lesions. In response, monocytes migrate and adhere to thesurface of the thickened area, then transmigrate through the cells (Figure 7).

The monocytes are differentiated into macrophages, which expressscavenger receptors on their surface, attract oxidised LDL (oxLDL) andare further transformed into lipid-rich foam cells. Foam cells also originatefrom vascular smooth muscle cells that have undergone phenotypic conver-sion into macrophage-like cells, thus mimicking their progression andtransformation (Ricciarelli et al. 2000). Foam cells are the major componentsof the atherosclerotic plaques and, regardless of their origin, they tend to dieas a result of apoptosis, because of the intracellular accumulation of freecholesterol (Kellner-Weibel et al. 1998). Macrophage-derived foam cells mayalso escape from the lesions into the peripheral circulation and die elsewherethrough apoptosis (Takahashi et al. 2002). As they mature, atheroscleroticplaques may protrude into the lumen, narrowing the lumen of the artery. Thismay lead to ischeamic symptoms, although the most severe consequencesare plaque rupture and thrombosis as a result of superficial erosion of theendothelium or uneven thinning and rupture of the lesion (Lee and Libby1997; Rosenfeld 2000).

Page 44: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

44

7 Differential gene expressionin atherosclerosis

7.1 Treatment with a therapeuticdrug candidate (Paper I)

In attempts to develop treatments for severe diseases, the pharmaceuticalindustries worldwide are constantly aiming to discover new drugs. Themolecular mechanisms involved in the co-regulation of multiple genes thataffect responses to these drugs are generally poorly understood. A possibleway to elucidate the complex molecular interactions involved is via differen-tial gene expression profiling. cDNA tag sequencing methods are preferableto microarray-based gene expression analysis for this purpose, since theyprovide absolute estimates of gene expression frequencies. Even better aretechniques that focus on key fractions of genes being expressed, using“selective” techniques.

In Paper I, we describe how a solid-phase RDA technique can be used toelucidate the molecular effects of N,N’-Diacetyl-L-cystine (DiNAC)(Sarnstrand et al. 1999), an anti-atherosclerotic drug candidate. As a testsystem for this purpose, a monocytic cell line (THP-1) (Auwerx 1991) wasused. The THP-1 cell line can be activated by various stimulants to differentiateinto phenotypes mimicking macrophages in atherosclerosis. Here, THP-1cells were activated with lipopolysaccaride (LPS), and compared with identicalLPS-activated cells exposed to DiNAC.

Total RNA was extracted from the cells, mRNA was isolated using oligo-dTparamagnetic beads and cDNA was synthesised. The double-stranded cDNA,from both non-treated and drug-treated cells, was digested and used asstarting material for multiple PCR-reactions to obtain approximately 150 mgDNA serving as representations in the RDA analyses. Three consecutive roundsof subtractive hybridisation were performed to obtain three difference productsfor both drug-treated and non-treated cells.

Page 45: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

45

In this study we evaluated two alternative approaches to identify differentiallyexpressed genes obtained after iterative rounds of RDA subtraction cycles.Previously, the most commonly used approach to select and isolate RDAfragments was to electrophoretically separate, excise, purify and clone them,assuming fragments generated in this way to be a representative set of thegenes that are differentially expressed in the sets of cells or tissues underexamination. Here, we used two different procedures to identify genes forwhich expression levels differed between the two materials. The first was thecommonly used selection strategy, whereby we excised both distinct bandsand band-patterned smears (size selection strategy), and the second was ashotgun approach in which the entire contents of the third set of differentialproducts were cloned without any prior selection. A high number of differentcontigs (150 out of 197) were obtained from the size-selected fragments,demonstrating that the gene fragments in these products display a high degreeof diversity. The analysis of the shotgun approach resulted in 54 out of 309different contigs. These results suggests that the separation a complex mixtureof fragments in the electrophoresis step may be inadequate to give a truereflection of quantitative differences between the test materials, and conclusionsbased on such separations may be somewhat misleading.

The obtained sequences were compared by BLAST (Altschul et al. 1990) tothe nucleotide sequences included in UniGene (build 89) and the ExpressedGene Anatomy Database (EGAD). To verify that the obtained gene frequenciesreflected genuine quantitative differences, real-time PCR was performed ona selection of gene fragments using the cDNA representations as templates.

The quality of the overall results of an RDA experiment is obviously dependenton the cloning strategy chosen to obtain the difference products, and ourresults suggested that the shotgun procedure has clear advantages. Hence,shotgun cloning approaches were adopted in the studies reported in all thefollowing papers.

Page 46: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

46

7.2 Foam cell formation in atheroscleroticlesions (Papers II and III)

Transcript profiling represents an important first step in understanding thediversity of cellular roles and mechanisms of genes. Several differentmethodologies have been developed for this purpose recently. Here, we usedan RDA technique to analyse the early gene expression in macrophagesaccompanying the phenotypic changes into foam cells upon exposure tooxLDL. We have shown that shotgun RDA and large-scale DNA sequencingcan be an attractive approach to monitor differential expression and that analysisof difference products can be analysed, to a certain extent, with high-throughputmicroarray techniques.

The monocytic THP-1 cell line was used again (Papers II and III) as a modelsystem to study differential gene expression. However, this time the cellswere stimulated with phorbol 12-myristate 13-acetate (PMA) to establish amacrophage phenotype, and then with oxLDL to trigger macrophagedifferentiation into foam cells. Total RNA was extracted from both oxLDL-treated and non-treated cells, mRNA was isolated using oligo-dT paramagneticbeads, and then cDNA was synthesised. A solid-phase RDA protocol wasapplied to the two different materials (treated and non-treated cells) withthree consecutive rounds of hybridisation. The six sets of difference productswere shotgun cloned and approximately 300 clones per differential productwere randomly chosen and sequenced. The obtained sequences were comparedby BLAST (Altschul et al. 1990) to the nucleotide sequences included inUniGene and EGAD. In parallel, a non-redundant set of clones from eachdata set was printed in triplicate onto amino-silane coated glass slides togetherwith positive and negative control genes from the human and Arabidopsisthaliana genomes. Labelled targets (differential products) for hybridisationwere generated by PCR in the presence of Cy3- or Cy5- labelled dCTPs.Scanning was performed using a confocal laser scanner and images thusobtained were analysed with GenePixPro 3.0 software.

Page 47: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

47

The results revealed that around 70 % of the assembled contigs of the thirddifference products comprised unique sequences (singletons). Also,approximately 20 % of all gene fragments in the final difference productsrepresented novel transcripts that had not been detected in the previous roundsof subtractive hybridisations. The substantial number of different genefragments present after three rounds of subtractive enrichment demonstratesthe complexity of gene regulatory events, as well as the RDA technique’sability to detect rare transcripts. However, the relative expression levelsderived using RDA may be misleading if an insufficient number of clones issequenced and analysed. One way to obtain more exact estimates of expressionrates could be to combine the RDA technique with large-scale microarrayanalysis. Microarray technology is a powerful tool, enabling expression profilesto be determined for thousands of genes simultaneously, although the detector’ssensitivity limits the ability to detect differences in the abundance of weaklyexpressed transcripts. It also requires prefabricated arrays harbouring spotsrepresenting all genes of interest, unless an array representing the totaltranscriptome of the organism is used. Until such microarrays are available/used for the organism under study, transcript-profiling methods that allowgene discovery (such as RDA) will yield information that would otherwisebe missed.

The performance of the microarray assay in this study demonstrated bothhigh specificity (no cross-hybridisation to negative control genes) and veryhigh sensitivity, since 97 % of the microarray elements repeatedly gavesignals above the intensity threshold we set (local background plus twostandard deviations). Also, as expected, the majority of the spots were red orgreen, indicating that they were differentially expressed, even though theexpression levels of barely 32 % of all replicates exceeded the minimumtwofold expression ratio nominally required to accept expression as beingdifferential.

The biological data derived in this study include information on genes thatplay crusial roles in cell cycle control and proliferation, inflammatoryresponses, several pathways that had not previously been implicatedin atherosclerosis, and the peroxisome proliferator-activated receptor

Page 48: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

48

(PPARgamma) pathway, which has previously been implicated in theinitiation and progression of atherosclerosis. Accumulating data suggest thatPPARgamma plays a central role in the macrophage response to highextracellular concentrations of oxLDL (Tontonoz et al. 1998; Kersten et al.2000). Several previously known PPARgamma target genes, e.g. the geneencoding adipophilin (Pelton et al. 1999), were identified and their up-regulation in the oxLDL-treated cells was confirmed. This was also the casefor the class B scavenger receptor CD36, which is considered to play a criticalrole in atherosclerosis foam cell formation by mediating the uptake of ligandslike oxidized lipoproteins (Tontonoz et al. 1998), apoptotic cells, and collagens.

In conclusion, we show that random sequencing of the difference productsgenerated an accurate transcript profile and that regulations of the obtainedgene fragments can be confirmed on a large-scale microarray analysis. Thecombination of these techniques enables significant differences in geneexpression to be detected, even for weakly expressed genes and the results tobe reliably validated in a high throughput manner.

7.3 Focal localisation of atheroscleroticplaques (Paper IV)

When exposed to sustained haemodynamic forces as a result of rapid (and,especially, turbulent) blood flow, changes occur in the vascular endotheliumin terms of both structure and function. These changes in the vessel wallshave great impact on the initiation and progression of atherosclerosis. It hasbeen known for quite some time that regions with turbulent blood flow aremore likely to develop atherosclerotic plaques than regions with moreuniform blood flow (Figure 8).

In the studies reported in Paper IV, we used a solid-phase RDA protocol toinvestigate the focal nature of atherosclerotic lesions and gene expressionprofiling in vivo. The investigations were based on a comparison between

Page 49: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

49

localisations that are likely, and others that are unlikely, to developatherosclerotic plaques in the aorta in ApoE-/- and LDLR -/- mice. The aortaof each of these mice was cleaned of adipose tissue and dissected into plaque-prone localisations (the aortic arch and proximal part of the abdominal aorta)and plaque-resistant localisations (the descending thoracic aorta and distalpart of the abdominal aorta). The tissues were snap-frozen in liquid nitrogen,total RNA was extracted and cDNA was synthesised using just 6 mg totalRNA. The double-stranded cDNA from the two materials was used asstarting material in multiplePCR-reactions to obtain atotal of approximately 500 µgDNA for each material, servingas representations in the RDAprotocol.

These representations, togetherwith the first and seconddifference products generatedby the RDA technique, wereshotgun cloned and more than400 clones from each data-set were sequenced. Eachsequence was manually editedand clustered into contigs using the Staden software package (Staden 1996).This revealed that the number of clusters successively increased during theRDA procedure, showing enrichment of differentially expressed genefragments (Table 2). Almost 2800 gene fragments potentially involved in thedevelopment of atherosclerotic lesions were compared by BLAST (using theE-value < 10-20) to the representative nucleotide sequences included in UniGene(build 100), 52 % of which represented novel transcripts. To independentlyconfirm the differential expression identified by RDA, a small subset of cloneswas selected for confirmation with real-time PCR using the cDNArepresentations as template. The results confirmed eleven out of twelvetranscripts to be differentially expressed, showing the sensitivity and reliabilityof the RDA technique.

Figure 8. Atherosclerotic plaques primarily developat branch points and curves in arteries (above,indicated as darker patches).Image modified from http://focus.hms.harvard.edu/2001/May4_2001/pathology.html

Page 50: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

50

The expression levels of several of the obtained differential transcripts appearto be modulated by shear stress in the arteries. Such mechanotransductionpreferentially occurs at specialised invaginated microdomains in the endothelialmembrane, called caveolae. The function of caveolae has been debated, butit now seems clear that they are stable membrane domains that are kept inplace by the actin cytoskeleton (van Deurs et al. 2003). Caveolae are importantin the organisation of cell surface receptors and the regulation of varioussignal transduction systems, such as the system regulating cholesterol uptake.In this study we found increased expression of caveolin, the major structuralelement of caveolae, as well as cofilin, an actin-binding gene, in the vessellocalisations thought to be especially susceptible to plaque formation.Another up-regulated membrane protein, co-localised with caveolin, is CD36,which was also detected in the studies reported in Paper III.

0

10

20

30

40

50

60

70

80

90

100

repr DP1 DP2

% o

f con

tigs Clusters

Singletons

Table 2. The distribution of clustered sequences, showing the enrichment of commonlyexpressed gene fragments. Data from upregulated genes in plaque prone regions of themouse aorta (Paper IV).

Page 51: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

51

8 Signature Tag RDAAs discussed above, RDA is a powerful technique for differential geneexpression profiling, although it has several disadvantages. Since RDA relieson endonuclease restrictions of a pool of unknown cDNAs at specific restrictionenzyme sites, fragments that only harbour one such site may be lost. Also,the majority of sequences in many databases represent the 3’- or 5’-ends ofthe cDNAs, while RDA generates fragments that are scattered throughoutthe cDNA region, except for the parts closest to their ends. Thus, identifyingthe obtained gene sequences may be problematic.

To address this problem we are developing a method (“Signature Tag RDA”)for identifying differentially expressed genes based solely on the 3’-ends ofcDNAs. In order to do this we have combined RDA with a strategy developedfor the amplification of cDNA tags (“signature tags”), in which the cDNAsare randomly fragmented into short tags of similar length, and the 3’-end(“signature tag”) population is then isolated and amplified by PCRamplification (Sievertzon et al. 2003). The strategy for the non-biased PCRamplification of 3´-end signature tags is outlined in Figure 9.

To study differential gene expression using this approach, the LateralHypothalamic Area (LHA) of very overweight and slightly overweight ratshas been used as a model system. First, mRNA was isolated from LHA tissuewith a designed 5’-biotinylated oligo-dT primer containing enzymaticrestriction sites needed in subsequent steps of the protocol. cDNA synthesiswas then performed, followed by random fragmentation of the cDNAs throughsonication into 100-600 bp fragments. Biotinylated 3´-end signature tags fromthe fragmented cDNA population were isolated onto paramagnetic streptavidin-coated beads and the non-biotinylated fragments were removed. The ends ofthe immobilised signature tags were repaired and blunt end adaptors containingPCR primer sites and enzyme restriction sites suitable for RDA were ligatedonto the 3’-end signature tags. The signature tags were released from themagnetic beads through NotI restriction digestion and then subjected to nestedPCR amplification using primers designed in-house. The obtained pools of

Page 52: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

52

AAAAA(A)n

AAAAA(A) n

AAAA(A)n

AAAAA(A) n

NUCLEUS

mRNA isolation

Fagmentation by sonication

AAAAAAAAAAAAA(A)nTTTTTTTTTTTTTTTTT5´- ---NotI--- RDRDA ---

cDNA synthesis

(Blunt end)

Immobilisation onto streptavidin coated support

-5´3´- -3´

Adaptor ligationRelease of 3´-tags by NotI restriction

AAAAAAAAAAAAA(A)nTTTTTTTTTTTTTT(T)n---NotI--- RDRDA ---

AAAAAAAAAAAAA(A)nTTTTTTTTTTTTTT(T)nNotI--- RDRDA --- RDRDA

Nested PCR amplification

RDA

SIGNATURE TAG RDA

Figure 9. The principle of Signature Tag RDA, based on identification of differentiallyexpressed genes utilizing 3’-end signature tags.

Page 53: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

53

amplified cDNA (for very overweight and slightly overweight rats) thenserved as representations for the RDA technique, which should theoreticallyrepresent the original transcripts expressed. Having obtained suchrepresentations the RDA technique can be used as previously described, exceptthat all primers and linkers have to be designed and appropriate adjustmentshave to be made to the PCR conditions.

Using the signature tag strategy avoids the problems with RDA earlierdiscussed. This strategy relies on random fragmentation of cDNA populationsfollowed by ligation of relevant adaptors suitable for RDA. Hence, the riskof losing fragments that only harbour one specific restriction site is avoided.Focusing entirely on the 3’-ends of the transcripts represented providesanother huge advantage. Furthermore, fragmentation of the cDNA populationsminimizes the risk of biased amplification due to the parallel amplificationof templates of several different sizes.

9 Concluding remarksThe work in this thesis describes further developments of especially therepresentational difference analysis (RDA) technique for selective differen-tial gene expression. This can be used independently as a tool for geneexpression profiling, but has recently also been combined with globalmicroarray analysis (Andersson et al. 2002), which indicates that combinationof technologies can be an important complement for future efforts to identifydifferentially expressed genes.

Page 54: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

54

Acknowledgements

Först och främst ett stort tack till alla som jag kanske glömmer räkna upp här(hemska tanke…).

Sen till övriga:

Joakim, för att du tagit dig an mig. För din exakta gräns mellan skvaller ochhemlighet, och för att du mer än jag trott på att detta varit möjligt.

Mathias, för att du skapat och gjort denna grupp till vad den är.

Sophia, Per-Åke och Stefan, för att ni alltid tagit er tid med mina frågor närjag inte har haft någon annan att vända mig till.

Jacob som lärt mig mycket om mycket.

Anna för att jag alltid fått låna din hjärna när min egen totalhavererat.

Nina och Malin för alla ingående samtal om magar och barnen som till slutkommer ut ur dem.

Tove, Lotta och Maria för alla glada skratt vi delat under vårt pysslande.Greta Garbo och Mathias glömmer jag aldrig. Inte heller att det är så mycketpumpa inuti en pumpa. Oj, får inte glömma tacka er som kom till minafågelfrukostar.

Anders Thelin för all din hjälp och för att du lärt mig uttala ”anrika”.

Tove, min labbkompis, för all tid vi delat på resor och allt jobbsnack somman ju faktiskt måste ha ibland.

Lotta för din glada personlighet och att jag lyckats få in dig på fågelspåret.

PerU och Valtteri, för att ni alltid tagit er tid till alla mina frågor och faktisktgjort det ni kunnat för att lösa mina problem så fort det nu varit möjligt. Eninte helt vanlig egenskap, tack.

Page 55: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

55

Till Karin, Jenny, Anna och Ingrid: Vi får grilla nästa sommar istället.

Annica och Anna W för enormt stöd de stunder jag behövt det.

Mina fantastiska vänner ute i verkligen, främst då Pernilla, Robert, Lisa,Mats, Camilla, Lasse och Josefin. Vad skulle jag göra utan er? Jag ska snartbli social igen, jag lovar.

Karin och Petra för alla våra fantastiska middagar.

K93-ligan: Anna, Henrik, Martin och Kristofer. Ok, jag blev sist, jag vet.

Cloetta, för att ni tillverkar kexchoklad.

Anders för barnpassning och för att du är den bästa storebror man kan ha.

Föräldrar och svärföräldrar för att ni är bäst! Jag älskar er allihop.

Och så förstås min egen lilla familj. Peter för att du vill dela ditt liv med migmed allt vad det innebär. Mina älskade ungar, Andreas och förstås lilla knyteti magen som fortfarande har vett att stanna där inne fast mamma har varit litestressad…

Page 56: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

56

References

Adams, M. D., J. M. Kelley, J. D. Gocayne, M. Dubnick, M. H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, R. F. Moreno and et al. (1991). “Complementary DNAsequencing: expressed sequence tags and human genome project.” Science 252(5013):1651-6.

Altschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman (1990). “Basic localalignment search tool.” J Mol Biol 215(3): 403-10.

Andersson, T., P. Unneberg, P. Nilsson, J. Odeberg, J. Quackenbush and J. Lundeberg (2002).“Monitoring of representational difference analysis subtraction procedures byglobal microarrays.” Biotechniques 32(6): 1348-50, 1352, 1354-6, 1358.

Auwerx, J. (1991). “The human leukemia cell line, THP-1: a multifacetted model for thestudy of monocyte-macrophage differentiation.” Experientia 47(1): 22-31.

Barczak, A., M. W. Rodriguez, K. Hanspers, L. L. Koth, Y. C. Tai, B. M. Bolstad, T. P. Speedand D. J. Erle (2003). “Spotted long oligonucleotide arrays for human gene expressionanalysis.” Genome Res 13(7): 1775-85.

Bonaldo, M. F., G. Lennon and M. B. Soares (1996). “Normalization and subtraction: twoapproaches to facilitate gene discovery.” Genome Res 6(9): 791-806.

Brookes, A. J. (1999). “The essence of SNPs.” Gene 234(2): 177-86.Carninci, P. and Y. Hayashizaki (1999). “High-efficiency full-length cDNA cloning.” Methods

Enzymol 303: 19-44.Chen, Y., E. R. Dougherty and M. L. Bittner (1997). “Ratio based decisions and the quantitative

analysis of cDNA microarray images.” J Biomed Opt 2(4): 364-374.Crawford, D. R., J. C. Kochheiser, G. P. Schools, S. L. Salmon and K. J. Davies (2002).

“Differential display: a critical analysis.” Gene Expr 10(3): 101-7.Das, M., I. Harvey, L. L. Chu, M. Sinha and J. Pelletier (2001). “Full-length cDNAs: more

than just reaching the ends.” Physiol Genomics 6(2): 57-80.Datson, N. A., J. van der Perk-de Jong, M. P. van den Berg, E. R. de Kloet and E. Vreugdenhil

(1999). ”MicroSAGE: a modified procedure for serial analysis of gene expression inlimited amounts of tissue.” Nucleic Acids Res 27(5): 1300-7.

Davies, P. F. (1997). “Mechanisms involved in endothelial responses to hemodynamic forces.”Atherosclerosis 131 Suppl: S15-7.

Dias Neto, E., R. G. Correa, S. Verjovski-Almeida, M. R. Briones, M. A. Nagai, W. da Silva,Jr., M. A. Zago, S. Bordin, F. F. Costa, G. H. Goldman, A. F. Carvalho, A. Matsukuma,G. S. Baia, D. H. Simpson, A. Brunstein, P. S. de Oliveira, P. Bucher, C. V. Jongeneel,M. J. O’Hare, F. Soares, R. R. Brentani, L. F. Reis, S. J. de Souza and A. J. Simpson(2000). “Shotgun sequencing of the human transcriptome with ORF expressed sequencetags.” Proc Natl Acad Sci U S A 97(7): 3491-6.

Diatchenko, L., Y. F. Lau, A. P. Campbell, A. Chenchik, F. Moqadam, B. Huang, S. Lukyanov,K. Lukyanov, N. Gurskaya, E. D. Sverdlov and P. D. Siebert (1996). “Suppression

Page 57: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

57

subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries.” Proc Natl Acad Sci U S A 93(12): 6025-30.

Dudley, J. P., J. S. Butel, S. H. Socher and J. M. Rosen (1978). “Detection of mouse mammarytumor virus RNA in BALB/c tumor cell lines of nonviral etiologies.” J Virol 28(3):743-52.

Eisen, M. B., P. T. Spellman, P. O. Brown and D. Botstein (1998). „Cluster analysis anddisplay of genome-wide expression patterns.“ Proc Natl Acad Sci U S A 95(25):14863-8.

Fodor, S. P., R. P. Rava, X. C. Huang, A. C. Pease, C. P. Holmes and C. L. Adams (1993).“Multiplexed biochemical assays with biological chips.” Nature 364(6437): 555-6.

Forozan, F., R. Karhu, J. Kononen, A. Kallioniemi and O. P. Kallioniemi (1997). „Genomescreening by comparative genomic hybridization.“ Trends Genet 13(10): 405-9.

Gibson, G. and S. Muse (2002). A primer of genome science, Sinauer Associates, Inc.Graveley, B. R. (2001). “Alternative splicing: increasing diversity in the proteomic world.”

Trends Genet 17(2): 100-7.Heeneman, S., J. P. Cleutjens, B. C. Faber, E. E. Creemers, R. J. van Suylen, E. Lutgens, K.

B. Cleutjens and M. J. Daemen (2003). “The dynamic extracellular matrix: interven-tion strategies during heart failure and atherosclerosis.” J Pathol 200(4): 516-25.

Hegde, P., R. Qi, K. Abernathy, C. Gay, S. Dharap, R. Gaspard, J. E. Hughes, E. Snesrud, N.Lee and J. Quackenbush (2000). “A concise guide to cDNA microarray analysis.”Biotechniques 29(3): 548-50, 552-4, 556 passim.

Hubank, M. and D. G. Schatz (1994). “Identifying differences in mRNA expression byrepresentational difference analysis of cDNA.” Nucleic Acids Res 22(25): 5640-8.

Hubank, M. and D. G. Schatz (1999). “cDNA representational difference analysis: a sensitiveand flexible method for identification of differentially expressed genes.” MethodsEnzymol 303: 325-49.

Ishii, M., S. Hashimoto, S. Tsutsumi, Y. Wada, K. Matsushima, T. Kodama and H. Aburatani(2000). “Direct comparison of GeneChip and SAGE on the quantitative accuracy intranscript profiling analysis.” Genomics 68(2): 136-43.

Kannbley, U., K. Kapinya, U. Dirnagl and G. Trendelenburg (2003). “Improved protocol forSAGE tag-to-gene allocation.” Biotechniques 34(6): 1212-4, 1216-9.

Kellner-Weibel, G., W. G. Jerome, D. M. Small, G. J. Warner, J. K. Stoltenborg, M. A. Kearney,M. H. Corjay, M. C. Phillips and G. H. Rothblat (1998). “Effects of intracellular freecholesterol accumulation on macrophage viability: a model for foam cell death.”Arterioscler Thromb Vasc Biol 18(3): 423-31.

Kerr, M. K. and G. A. Churchill (2001). “Statistical design and the analysis of gene expressionmicroarray data.” Genet Res 77(2): 123-8.

Kersten, S., B. Desvergne and W. Wahli (2000). „Roles of PPARs in health and disease.“Nature 405(6785): 421-4.

Kruth, H. S. (2001). ”Lipoprotein cholesterol and atherosclerosis.” Curr Mol Med 1(6):633-53.

Lander, E. S. (1999). ”Array of hope.” Nat Genet 21(1 Suppl): 3-4.Lander, E. S., L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K.

Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. How-

Page 58: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

58

land, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P.Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A.Sheridan, C. Sougnez, N. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wy-man, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N.Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L.French, D. Grafham, S. Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C.Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall,R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W.Hillier, J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla,K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner,A. Delehaunty, J. B. Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S.W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak,N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R.A. Gibbs, D. M. Muzny, S. E. Scherer, J. B. Bouck, E. J. Sodergren, K. C. Worley, C.M. Rives, J. H. Gorrell, M. L. Metzker, S. L. Naylor, R. S. Kucherlapati, D. L. Nel-son, G. M. Weinstock, Y. Sakaki, A. Fujiyama, M. Hattori, T. Yada, A. Toyoda, T.Itoh, C. Kawagoe, H. Watanabe, Y. Totoki, T. Taylor, J. Weissenbach, R. Heilig, W.Saurin, F. Artiguenave, P. Brottier, T. Bruls, E. Pelletier, C. Robert, P. Wincker, D. R.Smith, L. Doucette-Stamm, M. Rubenfield, K. Weinstock, H. M. Lee, J. Dubois, A.Rosenthal, M. Platzer, G. Nyakatura, S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang,G. Huang, J. Gu, L. Hood, L. Rowen, A. Madan, S. Qin, R. W. Davis, N. A. Federspiel,A. P. Abola, M. J. Proctor, R. M. Myers, J. Schmutz, M. Dickson, J. Grimwood, D. R.Cox, M. V. Olson, R. Kaul, N. Shimizu, K. Kawasaki, S. Minoshima, G. A. Evans, M.Athanasiou, R. Schultz, B. A. Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Rein-hardt, W. R. McCombie, M. de la Bastide, N. Dedhia, H. Blocker, K. Hornischer, G.Nordsiek, R. Agarwala, L. Aravind, J. A. Bailey, A. Bateman, S. Batzoglou, E. Birney,P. Bork, D. G. Brown, C. B. Burge, L. Cerutti, H. C. Chen, D. Church, M. Clamp, R.R. Copley, T. Doerks, S. R. Eddy, E. E. Eichler, T. S. Furey, J. Galagan, J. G. Gilbert,C. Harmon, Y. Hayashizaki, D. Haussler, H. Hermjakob, K. Hokamp, W. Jang, L. S.Johnson, T. A. Jones, S. Kasif, A. Kaspryzk, S. Kennedy, W. J. Kent, P. Kitts, E. V.Koonin, I. Korf, D. Kulp, D. Lancet, T. M. Lowe, A. McLysaght, T. Mikkelsen, J. V.Moran, N. Mulder, V. J. Pollara, C. P. Ponting, G. Schuler, J. Schultz, G. Slater, A. F.Smit, E. Stupka, J. Szustakowski, D. Thierry-Mieg, J. Thierry-Mieg, L. Wagner, J.Wallis, R. Wheeler, A. Williams, Y. I. Wolf, K. H. Wolfe, S. P. Yang, R. F. Yeh, F.Collins, M. S. Guyer, J. Peterson, A. Felsenfeld, K. A. Wetterstrand, A. Patrinos, M.J. Morgan, J. Szustakowki, P. de Jong, J. J. Catanese, K. Osoegawa, H. Shizuya, S.Choi and Y. J. Chen (2001). ”Initial sequencing and analysis of the human genome.”Nature 409(6822): 860-921.

Lee, N. H., K. G. Weinstock, E. F. Kirkness, J. A. Earle-Hughes, R. A. Fuldner, S. Marmaros,A. Glodek, J. D. Gocayne, M. D. Adams, A. R. Kerlavage and et al. (1995).“Comparative expressed-sequence-tag analysis of differential gene expression profi-les in PC-12 cells before and after nerve growth factor treatment.” Proc Natl AcadSci U S A 92(18): 8303-7.

Lee, R. T. and P. Libby (1997). “The Unstable Atheroma.” Arterioscler Thromb Vasc Biol17(10): 1859-1867.

Page 59: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

59

Liang, F., I. Holt, G. Pertea, S. Karamycheva, S. L. Salzberg and J. Quackenbush (2000).“Gene index analysis of the human genome estimates approximately 120,000 ge-nes.” Nat Genet 25(2): 239-40.

Liang, P. and A. B. Pardee (1992). “Differential display of eukaryotic messenger RNA bymeans of the polymerase chain reaction.” Science 257(5072): 967-71.

Liang, P. (2002). “A decade of differential display.” Biotechniques 33(2): 338-44, 346.Lisitsyn, N. and M. Wigler (1993). “Cloning the differences between two complex geno-

mes.” Science 259(5097): 946-51.Lockhart, D. J., H. Dong, M. C. Byrne, M. T. Follettie, M. V. Gallo, M. S. Chee, M. Mittmann,

C. Wang, M. Kobayashi, H. Horton and E. L. Brown (1996). “Expression monitoringby hybridization to high-density oligonucleotide arrays.” Nat Biotechnol 14(13):1675-80.

Lukyanov, K. A., G. A. Launer, V. S. Tarabykin, A. G. Zaraisky and S. A. Lukyanov (1995).“Inverted terminal repeats permit the average length of amplified DNA fragments tobe regulated during preparation of cDNA libraries by polymerase chain reaction.”Anal Biochem 229(2): 198-202.

Maxam, A. M. and W. Gilbert (1977). “A new method for sequencing DNA.” Proc Natl AcadSci U S A 74(2): 560-4.

Mullis, K., F. Faloona, S. Scharf, R. Saiki, G. Horn and H. Erlich (1986). “Specific enzymaticamplification of DNA in vitro: the polymerase chain reaction.” Cold Spring HarbSymp Quant Biol 51 Pt 1: 263-73.

Neilson, L., A. Andalibi, D. Kang, C. Coutifaris, J. F. Strauss, 3rd, J. A. Stanton and D. P.Green (2000). “Molecular phenotype of the human oocyte by PCR-SAGE.” Genomics63(1): 13-24.

Odeberg, J., T. Wood, A. Blucher, J. Rafter, G. Norstedt and J. Lundeberg (2000). “A cDNARDA protocol using solid-phase technology suited for analysis in small tissue samples.”Biomol Eng 17(1): 1-9.

Pelton, P. D., L. Zhou, K. T. Demarest and T. P. Burris (1999). “PPARgamma activationinduces the expression of the adipocyte fatty acid binding protein gene in humanmonocytes.” Biochem Biophys Res Commun 261(2): 456-8.

Peters, D. G., A. B. Kassam, H. Yonas, E. H. O’Hare, R. E. Ferrell and A. M. Brufsky (1999).“Comprehensive transcript analysis in small quantities of mRNA by SAGE-lite.”Nucleic Acids Res 27(24): e39.

Quackenbush, J. (2002). “Microarray data normalization and transformation.” Nat Genet 32Suppl: 496-501.

Ricciarelli, R., J. M. Zingg and A. Azzi (2000). ”Vitamin E reduces the uptake of oxidizedLDL by inhibiting CD36 scavenger receptor expression in cultured aortic smoothmuscle cells.” Circulation 102(1): 82-7.

Rosenfeld, M. E. (2000). “An overview of the evolution of the atherosclerotic plaque: fromfatty streak to plaque rupture and thrombosis.” Z Kardiol 89 Suppl 7: 2-6.

Ross, R. (1999). “Atherosclerosis—an inflammatory disease.” N Engl J Med 340(2):115-26.

Sanger, F., S. Nicklen and A. R. Coulson (1977). ”DNA sequencing with chain-terminatinginhibitors.” Proc Natl Acad Sci U S A 74(12): 5463-7.

Page 60: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

60

Sarnstrand, B., A. H. Jansson, G. Matuseviciene, A. Scheynius, S. Pierrou and H. Bergstrand(1999). “N,N’-Diacetyl-L-cystine-the disulfide dimer of N-acetylcysteine-is a po-tent modulator of contact sensitivity/delayed type hypersensitivity reactions in rodents.”J Pharmacol Exp Ther 288(3): 1174-84.

Schena, M., D. Shalon, R. W. Davis and P. O. Brown (1995). “Quantitative monitoring ofgene expression patterns with a complementary DNA microarray.” Science 270(5235):467-70.

Schuchhardt, J., D. Beule, A. Malik, E. Wolski, H. Eickhoff, H. Lehrach and H. Herzel (2000).„Normalization strategies for cDNA microarrays.“ Nucleic Acids Res 28(10): E47.

Shoemaker, D. D. and P. S. Linsley (2002). “Recent developments in DNA microarrays.”Curr Opin Microbiol 5(3): 334-7.

Sievertzon, M., L. Agaton, P. Nilsson and J. Lundeberg (2003). “Amplification of mRNApopulations by a cDNA tag strategy.” Biotechniques In press.

Smyth, G. K., Y. H. Yang and T. Speed (2003). “Statistical issues in cDNA microarray dataanalysis.” Methods Mol Biol 224: 111-36.

Soares, M. B., M. F. Bonaldo, P. Jelene, L. Su, L. Lawton and A. Efstratiadis (1994).“Construction and characterization of a normalized cDNA library.” Proc Natl AcadSci U S A 91(20): 9228-32.

Staden, R. (1996). “The Staden sequence analysis package.” Mol Biotechnol 5(3): 233-41.Stein, J. and P. Liang (2002). “Differential display technology: a general guide.” Cell Mol

Life Sci 59(8): 1235-40.Suzuki, Y. and S. Sugano (2001). “Construction of full-length-enriched cDNA libraries. The

oligo-capping method.” Methods Mol Biol 175: 143-53.Takahashi, K., M. Takeya and N. Sakashita (2002). “Multifunctional roles of macrophages

in the development and progression of atherosclerosis in humans and experimentalanimals.” Med Electron Microsc 35(4): 179-203.

Tontonoz, P., L. Nagy, J. G. Alvarez, V. A. Thomazy and R. M. Evans (1998). “PPARgammapromotes monocyte/macrophage differentiation and uptake of oxidized LDL.” Cell93(2): 241-52.

van Deurs, B., K. Roepstorff, A. M. Hommelgaard and K. Sandvig (2003). “Caveolae: anchored,multifunctional platforms in the lipid ocean.” Trends Cell Biol 13(2): 92-100.

Watson, J. D. and F. H. Crick (1953). “Molecular structure of nucleic acids; a structure fordeoxyribose nucleic acid.” Nature 171(4356): 737-8.

Velculescu, V. E., L. Zhang, B. Vogelstein and K. W. Kinzler (1995). „Serial analysis of geneexpression.“ Science 270(5235): 484-7.

Velculescu, V. E., L. Zhang, W. Zhou, J. Vogelstein, M. A. Basrai, D. E. Bassett, Jr., P. Hieter,B. Vogelstein and K. W. Kinzler (1997). “Characterization of the yeast transcriptome.”Cell 88(2): 243-51.

Welsh, J., K. Chada, S. S. Dalal, R. Cheng, D. Ralph and M. McClelland (1992). “Arbitrarilyprimed PCR fingerprinting of RNA.” Nucleic Acids Res 20(19): 4965-70.

Venter, J. C., M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M.Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H.Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski,G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder,A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M.

Page 61: Subtracted Approaches to Gene Expression Analysis in ...kth.diva-portal.org/smash/get/diva2:9521/FULLTEXT01.pdf · Subtracted Approaches to Gene Expression Analysis in Atherosclerosis

Stina Boräng

61

Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M.Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K.Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R.Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V.Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge,F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum,Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M.Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg,W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C.Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F.Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A.Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I.Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Dana-her, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A.Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J.Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A.Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy,B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R.Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood,E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J.Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K.Zaveri, J. F. Abril, R. Guigo, M. J. Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal,H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo,S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Al-len, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H.Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Donnelly, D. Ely, S.Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K.Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C.Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J.Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N.Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M.Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M.Wen, D. Wu, M. Wu, A. Xia, A. Zandieh and X. Zhu (2001). “The sequence of thehuman genome.” Science 291(5507): 1304-51.

Virlon, B., L. Cheval, J. M. Buhler, E. Billon, A. Doucet and J. M. Elalouf (1999). “Serialmicroanalysis of renal transcriptomes.” Proc Natl Acad Sci U S A 96(26): 15286-91.

Yamamoto, M., T. Wakatsuki, A. Hada and A. Ryo (2001). “Use of serial analysis of geneexpression (SAGE) technology.” J Immunol Methods 250(1-2): 45-66.

Yang, Y. H., S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai and T. P. Speed (2002).“Normalization for cDNA microarray data: a robust composite method addressingsingle and multiple slide systematic variation.” Nucleic Acids Res 30(4): e15.

Ye, S. Q., L. Q. Zhang, F. Zheng, D. Virgil and P. O. Kwiterovich (2000). “miniSAGE: geneexpression profiling using serial analysis of gene expression from 1 microg totalRNA.” Anal Biochem 287(1): 144-52.