saulo augusto de paula pinto 1, 2 { [email protected] }

1
Saulo Augusto de Paula Pinto 1, 2 {[email protected] } 1 Introduction In order to identify a possible common framework of gene expression in samples of gene expression data, 418 samples that compose 13 NCBI-GEO series generated on the top of Affymetrix GeneChips platform and 31 SAGE Genie libraries were analyzed. 2 Methodology 3 Results Every sample from different organisms follows a exponential-like decay as the expression values diminish, disregarding the technology, the number of distinct sequences in the samples, the organism or tissue kind. An algorithm to find out a weak framework: one that is composed by pairs of genes in which the first element of the pair is always more expressed than the second one in every analyzed sample. 1 Laboratório de Biodados Departamento de Bioquímica e Imunologia Instituto de Ciências Biológicas – UFMG 5 Conclusions The results point to the existence of a gene expression framework of genes that keep their expression sorting through a vast different set of tissues. Part of a weak framework found for 36 human normal tissues samples considering only the 20 most expressed sequences (MESs) from each sample. A directed edge indicates the gene that is most expressed (source) and the least expressed (target). 2 Instituto de Informática PUC MINAS BARREIRO Toward the Identification of a Gene Expression Framework in Different Types of Tissues and Organisms This finding suggests that the sorting of gene expression and not only the genes expressed has a determinant role in the tissues or organism character. Some results are shown for two data series: one of 36 human normal tissues samples and one of 11 A. thaliana tissues (GEO accessions: GSE2361, GSE607). It was found that the expression sorting is kept in such a way that weak framework rate between a pair of samples can be used even to cluster a set of gene expression data samples. Highly physiologically-related tissue pairs like [amygdala, hippocampus] and [prostate, bladder] or sample replicates like [leaf_gh1, leaf_gh2] have as high as 94.7%, 89.7%, and 94.12% of their sequences pairs conserved, respectively. On the other side, in H. sapiens pairs composed of different tissues like those involving bone marrow, liver and the central nervous system tissues keep expression sorting poorly (< 22%). 4 Discussion Considering all 36 H. sapiens tissues together, 28.5% of the 3,064,841 possible pairs were conserved. For A. thaliana stem and flower conserved least (< 47%) and the 11 samples conserved 55.45% (22892007 of 41286376), as expected to a less complex organism with less diversity of tissues. INPUT: a set of samples sorted by the gene expression such that the most expressed gene (sequence) is in the position 0 and the least expressed is in position N-1, where N is the number of genes (sequences) in each sample. OUTPUT: a list of pairs of genes (sequences) in which the first member keeps its expression order relative to the second member in every sample. 1) Chose a reference sample only to build pairs; 2) For each pair of genes [G C , G L ] in the reference sample where G C is more expressed than G L do 2.1) If G C is more expressed than G L in every sample then include the pair [G C , G L ] in the weak framework; else discard the pair. J. Miguel Ortega 1 {[email protected] } Amygdala hippocampus 94.7 S Muscle Small Intestine 21. 5 Caudate Nucleus hippocampus 91.2 Cerebellum Liver 21. 3 Amygdala Thalamus 91.1 Small Intestine Fetal Brain 20. 8 Amygdala Caudate Nucleus 91.0 S Muscle Caudate Nucleus 20. 4 Corpus Spinal Cord 90.7 Heart Salivary Gland 20. 1 hippocampus Thalamus 90.4 Pancreas Caudate Nucleus 19. 6 Caudate Nucleus Thalamus 89.8 Heart Liver 19. 3 Prostate Bladder 89.7 Caudate Nucleus Fetal Liver 19. 1 Brain Thalamus 89.6 S Muscle Liver 18. 9 Brain Amygdala 89.5 Salivary Gland Caudate Nucleus 18. 8 Brain hippocampus 89.1 Small Intestine Caudate Nucleus 18. 6 Ovary Bladder 88.5 Bone Marrow Thalamus 18. 5 Small Intestine Colon 88.2 Salivary Gland Liver 18. 1 Thalamus Spinal Cord 88.0 Fetal Brain Liver 17. 7 Bladder Breast 87.9 Salivary Gland Bone Marrow 17. 4 Caudate Nucleus Spinal Cord 87.7 Caudate Nucleus Liver 16. 4 Caudate Nucleus Corpus 87.7 Bone Marrow Liver 15. 0 Cerebellum Brain 87.5 S Muscle Bone Marrow 14. 1 Bladder Lung 87.3 Heart Bone Marrow 12. 8 Bladder Trachea 87.2 Bone Marrow Caudate Nucleus 12. 1 Support: FAPEMIG

Upload: morwen

Post on 06-Jan-2016

37 views

Category:

Documents


3 download

DESCRIPTION

Toward the Identification of a Gene Expression Framework in Different Types of Tissues and Organisms. Saulo Augusto de Paula Pinto 1, 2 { [email protected] }. J. Miguel Ortega 1 { [email protected] }. 2 Instituto de Informática PUC MINAS BARREIRO. 1 Laboratório de Biodados - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Saulo Augusto de Paula Pinto 1, 2  { saulo@pucminas.br }

Saulo Augusto de Paula Pinto1, 2

{[email protected]}

1 Introduction

In order to identify a possible common framework of gene expression in samples of gene expression data, 418 samples that compose 13 NCBI-GEO series generated on the top of Affymetrix GeneChips platform and 31 SAGE Genie libraries were analyzed.

2 Methodology

3 Results

Every sample from different organisms follows a exponential-like decay as the expression values diminish, disregarding the technology, the number of distinct sequences in the samples, the organism or tissue kind.

An algorithm to find out a weak framework: one that is composed by pairs of genes in which the first element of the pair is always more expressed than the second one in every analyzed sample.

1Laboratório de BiodadosDepartamento de Bioquímica e Imunologia

Instituto de Ciências Biológicas – UFMG

5 Conclusions

The results point to the existence of a gene expression framework of genes that keep their expression sorting through a vast different set of tissues.

Part of a weak framework found for 36 human normal tissues samples considering only the 20 most expressed sequences (MESs) from each sample. A directed edge indicates the gene that is most expressed (source) and the least expressed (target).

2Instituto de InformáticaPUC MINAS BARREIRO

Toward the Identification of a Gene Expression Framework in Different Types of Tissues and

Organisms

This finding suggests that the sorting of gene expression and not only the genes expressed has a determinant role in the tissues or organism character.

Some results are shown for two data series: one of 36 human normal tissues samples and one of 11 A. thaliana tissues (GEO accessions: GSE2361, GSE607).

It was found that the expression sorting is kept in such a way that weak framework rate between a pair of samples can be used even to cluster a set of gene expression data samples.

Highly physiologically-related tissue pairs like [amygdala, hippocampus] and [prostate, bladder] or sample replicates like [leaf_gh1, leaf_gh2] have as high as 94.7%, 89.7%, and 94.12% of their sequences pairs conserved, respectively.

On the other side, in H. sapiens pairs composed of different tissues like those involving bone marrow, liver and the central nervous system tissues keep expression sorting poorly (< 22%).

4 Discussion

Considering all 36 H. sapiens tissues together, 28.5% of the 3,064,841 possible pairs were conserved. For A. thaliana stem and flower conserved least (< 47%)and the 11 samples conserved 55.45% (22892007 of 41286376), as expected to a less complex organism with less diversity of tissues.

INPUT: a set of samples sorted by the gene expression such that the most expressed gene (sequence) is in the position 0 and the least expressed is in position N-1, where N is the number of genes (sequences) in each sample.

OUTPUT: a list of pairs of genes (sequences) in which the first member keeps its expression order relative to the second member in every sample.

1) Chose a reference sample only to build pairs;2) For each pair of genes [GC, GL] in the reference sample where GC is more expressed than GL do 2.1) If GC is more expressed than GL in every sample then include the pair [GC, GL] in the weak framework; else discard the pair.

J. Miguel Ortega1

{[email protected]}

Amygdala hippocampus 94.7 S Muscle Small Intestine21.

5

Caudate Nucleus hippocampus 91.2 Cerebellum Liver

21.3

Amygdala Thalamus 91.1 Small Intestine Fetal Brain20.

8

Amygdala Caudate Nucleus 91.0 S Muscle

Caudate Nucleus

20.4

Corpus Spinal Cord 90.7 Heart Salivary Gland20.

1

hippocampus Thalamus 90.4 PancreasCaudate Nucleus

19.6

Caudate Nucleus Thalamus 89.8 Heart Liver

19.3

Prostate Bladder 89.7 Caudate Nucleus Fetal Liver19.

1

Brain Thalamus 89.6 S Muscle Liver18.

9

Brain Amygdala 89.5 Salivary GlandCaudate Nucleus

18.8

Brain hippocampus 89.1 Small IntestineCaudate Nucleus

18.6

Ovary Bladder 88.5 Bone Marrow Thalamus18.

5

Small Intestine Colon 88.2 Salivary Gland Liver18.

1

Thalamus Spinal Cord 88.0 Fetal Brain Liver17.

7

Bladder Breast 87.9 Salivary Gland Bone Marrow17.

4

Caudate Nucleus Spinal Cord 87.7 Caudate Nucleus Liver

16.4

Caudate Nucleus Corpus 87.7 Bone Marrow Liver

15.0

Cerebellum Brain 87.5 S Muscle Bone Marrow14.

1

Bladder Lung 87.3 Heart Bone Marrow12.

8

Bladder Trachea 87.2 Bone MarrowCaudate Nucleus

12.1

Support: FAPEMIG