rna-sequencingdatawithprotein–proteininteractionnetworks ... · seven lowest p-values (e.g.,...

1
Functional module detection through integration of single-cell RNA-sequencing data with protein–protein interaction networks F LORIAN K LIMM 1 ,E NRIQUE M. T OLEDO 2 ,T HOMAS M ONFEUGA 2 ,F ANG Z HANG 2 ,C HARLOTTE M. D EANE 1 , AND G ESINE R EINERT 1 1 Department of Statistics, University of Oxford, 2 Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford BIO R XIV: HTTPS :// DOI . ORG /10.1101/698647 S UMMARY Protein–protein interaction networks (PPINs) are abstract representations of inter- actions between proteins. While there has been considerable insight from investigating PPINs alone [1], analysing them in an inte- grated way, together with gene-expression data provides biological context [3]. In recent years, much attention has been given to single-cell RNA-sequencing (scRNA-seq) techniques as they allow researchers to study and characterise tissues at single- cell resolution [4]. We present SC PPIN, a method to integrate such scRNA-seq data with PPINs to detect active modules in cells of different transcriptional states [5]. As a case study, we investigate scRNA-seq from human liver spheroids but these techniques are also applicable to other organisms and tissues. This novel method allows us to iden- tify proteins important in liver metabolism that could not be detected from gene ex- pression data alone. Furthermore, we can associate cells in a given transcriptional state with enriched biological pathways. M ETHOD 1. Clustering of scRNA-seq data (e.g., with S EURAT [2]). 2. Computation of p-values of differential expression for each cluster. 3. Estimation of node scores by fitting a beta-uniform model with a maximum-likelihood approach [3]. 4. Combination of these node scores with a reference PPIN to construct node-weighted PPINs for each cluster. 5. Computation of functional modules with maximal change in gene expression as maximum-weight connected subgraphs with DAPCSTP [6]. L IVER D ATA A PPLICATION −20 0 20 40 −40 −20 0 20 Clustering DEGs analysis Functional module APP EGFR RPS27 PPP1R3C SCD ALDOB RPL41 ADCK3 SEPP1 log10(p−values) 2 rank log(p-value) protein/gene 1 -63 PPP1R3C 2 -54 SEPP1 3 -53 HSD17B13 4 -47 ADH1B 5 -43 S100A6 6 -40 ALDOB 7 -36 SCD . . . 373 -7 APP . . . . 1800 -2 EGFR . . t-SNE Dimension 1 t-SNE Dimension 2 Cell type cholangiocytes macrophage hepatocyte stellated Cluster number 0 1 2 3 4 5 6 7 8 9 63 As a case study, we measure scRNA-seq in human primary hepatocytes. We use S EURAT to cluster the cells and to identify differentially expressed genes (DEG). We construct a PPIN from the publicly available B IO G RID database, version 3.5.166. We use SC PPIN to detect functional modules for each cluster of hepatocytes. Here, we show one example: A functional module consisting of nine proteins. This includes the proteins expressed by the genes that have the seven lowest p-values (e.g., PPP1R3C) but also two additional proteins: APP and EGFR, which are both membrane-bound receptors. If we use SC PPIN to detect functional modules in different cell clusters, we obtain different functional modules that are associated with different biological functions in the liver, e.g., ‘response to inflammation’ or ‘alcohol metabolic process’. B ENEFITS OF THIS METHOD a b d e g i j c f h active modules in two different transcriptional states protein is not differently expressed but has a connector function 1. We associate each cell cluster (which represent cell types or transcriptional states) with an active module in the PPIN. 2. This identifies proteins that are not dif- ferentially expressed and thus would have been missed by a differential ex- pression analysis alone. R EFERENCES [1] Waqar Ali, Charlotte M. Deane, and Gesine Reinert. Protein interaction networks and their statistical analysis. In Michael P. H. Stumpf, David J. Balding, and Mark Girolami, editors, Handbook of Statistical Systems Biology, pages 200–234. John Wiley & Sons, Ltd Chichester, UK, 2011. [2] Andrew Butler, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. Integrating single-cell tran- scriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5):411, 2018. [3] Marcus T Dittrich, Gunnar W Klau, Andreas Rosenwald, Thomas Dandekar, and Tobias Müller. Identifying func- tional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics, 24(13):i223– i231, 2008. [4] Byungjin Hwang, Ji Hyun Lee, and Duhee Bang. Single-cell RNA sequencing technologies and bioinformatics pipelines. Experimental & Molecular Medicine, 50(8):96, 2018. [5] Florian Klimm, Enrique M Toledo, Thomas Monfeuga, Fang Zhang, Charlotte M Deane, and Gesine Reinert. Func- tional module detection through integration of single-cell RNA sequencing data with protein-protein interaction networks. bioRxiv, 2019. [6] Markus Leitner, Ivana Ljubi´ c, Martin Luipersbeck, and Markus Sinnl. A dual ascent-based branch-and-bound framework for the prize-collecting steiner tree and related problems. INFORMS Journal on Computing, 30(2):402– 420, 2018. A CKNOWLEDGEMENTS This project is funded by a Novo Nordisk– University of Oxford pump priming award under the Strategic Alliance between the two partners and the EPSPRC. A VAILABILITY The SC PPIN method is available as an R li- brary and as an online-tool under github.com/floklimm/scPPIN

Upload: others

Post on 08-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RNA-sequencingdatawithprotein–proteininteractionnetworks ... · seven lowest p-values (e.g., PPP1R3C) but also two additional proteins: APP and EGFR, which are both membrane-bound

Functional module detection through integration of single-cellRNA-sequencing data with protein–protein interaction networks

FLORIAN KLIMM1 , ENRIQUE M. TOLEDO2, THOMAS MONFEUGA2, FANG ZHANG2, CHARLOTTE M. DEANE1 ,AND GESINE REINERT1

1Department of Statistics, University of Oxford, 2Discovery Technology and Genomics, Novo Nordisk Research Centre OxfordBIORXIV: HTTPS://DOI.ORG/10.1101/698647

SUMMARYProtein–protein interaction networks

(PPINs) are abstract representations of inter-actions between proteins. While there hasbeen considerable insight from investigatingPPINs alone [1], analysing them in an inte-grated way, together with gene-expressiondata provides biological context [3]. Inrecent years, much attention has been givento single-cell RNA-sequencing (scRNA-seq)techniques as they allow researchers tostudy and characterise tissues at single-cell resolution [4]. We present SCPPIN, amethod to integrate such scRNA-seq datawith PPINs to detect active modules in cellsof different transcriptional states [5]. As acase study, we investigate scRNA-seq fromhuman liver spheroids but these techniquesare also applicable to other organisms andtissues. This novel method allows us to iden-tify proteins important in liver metabolismthat could not be detected from gene ex-pression data alone. Furthermore, we canassociate cells in a given transcriptional statewith enriched biological pathways.

METHOD

1. Clustering of scRNA-seq data (e.g., with SEURAT [2]).

2. Computation of p-values of differential expression for each cluster.

3. Estimation of node scores by fitting a beta-uniform model with a maximum-likelihoodapproach [3].

4. Combination of these node scores with a reference PPIN to construct node-weightedPPINs for each cluster.

5. Computation of functional modules with maximal change in gene expression asmaximum-weight connected subgraphs with DAPCSTP [6].

LIVER DATA APPLICATION

−20

0

20

40

−40 −20 0 20

Clustering DEGs analysis Functional module

APP

EGFR

RPS27

PPP1R3C

SCD

ALDOB

RPL41

ADCK3SEPP1

log10(p−values)

−2

rank log(p-value) protein/gene

1 -63 PPP1R3C2 -54 SEPP13 -53 HSD17B134 -47 ADH1B5 -43 S100A66 -40 ALDOB7 -36 SCD...373 -7 APP....

1800 -2 EGFR..t-SNE Dimension 1

t-SN

E D

imen

sion

2

Cell typecholangiocytesmacrophagehepatocytestellated

Cluster number0123456789

−63

As a case study, we measure scRNA-seq in human primary hepatocytes. We use SEURAT tocluster the cells and to identify differentially expressed genes (DEG). We construct a PPIN fromthe publicly available BIOGRID database, version 3.5.166. We use SCPPIN to detect functionalmodules for each cluster of hepatocytes. Here, we show one example: A functional moduleconsisting of nine proteins. This includes the proteins expressed by the genes that have theseven lowest p-values (e.g., PPP1R3C) but also two additional proteins: APP and EGFR, whichare both membrane-bound receptors.

If we use SCPPIN to detect functional modules in different cell clusters, we obtain differentfunctional modules that are associated with different biological functions in the liver, e.g.,‘response to inflammation’ or ‘alcohol metabolic process’.

BENEFITS OF THIS METHOD

a

b

d

e

g

i

jc

f

h

active modules in two different

transcriptional states

protein is not differently expressed

but has a connector function

1. We associate each cell cluster (whichrepresent cell types or transcriptionalstates) with an active module in thePPIN.

2. This identifies proteins that are not dif-ferentially expressed and thus wouldhave been missed by a differential ex-pression analysis alone.

REFERENCES

[1] Waqar Ali, Charlotte M. Deane, and Gesine Reinert. Protein interaction networks and their statistical analysis. InMichael P. H. Stumpf, David J. Balding, and Mark Girolami, editors, Handbook of Statistical Systems Biology, pages200–234. John Wiley & Sons, Ltd Chichester, UK, 2011.

[2] Andrew Butler, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. Integrating single-cell tran-scriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5):411, 2018.

[3] Marcus T Dittrich, Gunnar W Klau, Andreas Rosenwald, Thomas Dandekar, and Tobias Müller. Identifying func-tional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics, 24(13):i223–i231, 2008.

[4] Byungjin Hwang, Ji Hyun Lee, and Duhee Bang. Single-cell RNA sequencing technologies and bioinformaticspipelines. Experimental & Molecular Medicine, 50(8):96, 2018.

[5] Florian Klimm, Enrique M Toledo, Thomas Monfeuga, Fang Zhang, Charlotte M Deane, and Gesine Reinert. Func-tional module detection through integration of single-cell RNA sequencing data with protein-protein interactionnetworks. bioRxiv, 2019.

[6] Markus Leitner, Ivana Ljubic, Martin Luipersbeck, and Markus Sinnl. A dual ascent-based branch-and-boundframework for the prize-collecting steiner tree and related problems. INFORMS Journal on Computing, 30(2):402–420, 2018.

ACKNOWLEDGEMENTSThis project is funded by a Novo Nordisk–University of Oxford pump priming awardunder the Strategic Alliance between the twopartners and the EPSPRC.

AVAILABILITYThe SCPPIN method is available as an R li-brary and as an online-tool under

github.com/floklimm/scPPIN