original article - yale school of medicine...emphysema phenotype, and n=20 histologically normal...

13
&get_box_var; ORIGINAL ARTICLE Integrated Genomics Reveals Convergent Transcriptomic Networks Underlying Chronic Obstructive Pulmonary Disease and Idiopathic Pulmonary Fibrosis Rebecca L. Kusko 1 *, John F. Brothers II 1 *, John Tedrow 2 , Kusum Pandit 2 , Luai Huleihel 2 , Catalina Perdomo 1 , Gang Liu 1 , Brenda Juan-Guardela 3 , Daniel Kass 2 , Sherry Zhang 1 , Marc Lenburg 1 , Fernando Martinez 4 , John Quackenbush 5 , Frank Sciurba 2 , Andrew Limper 6 , Mark Geraci 7 , Ivana Yang 7 , David A. Schwartz 7 , Jennifer Beane 1 , Avrum Spira 1, and Naftali Kaminski 31 Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts; 2 Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania; 3 Pulmonary, Critical Care, and Sleep Medicine, Yale School of Medicine, New Haven, Connecticut; 4 Pulmonary & Critical Care Medicine, University of Michigan, Ann Arbor, Michigan; 5 Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts; 6 Mayo Clinic, Rochester, Minnesota; and 7 Pulmonary Sciences and Critical Care Medicine, UC Denver, Denver, Colorado ORCID IDs: 0000-0003-1467-0489 (A.S.); 0000-0001-5917-4601 (N.K.). Abstract Rationale: Despite shared environmental exposures, idiopathic pulmonary brosis (IPF) and chronic obstructive pulmonary disease are usually studied in isolation, and the presence of shared molecular mechanisms is unknown. Objectives: We applied an integrative genomic approach to identify convergent transcriptomic pathways in emphysema and IPF. Methods: We dened the transcriptional repertoire of chronic obstructive pulmonary disease, IPF, or normal histology lungs using RNA-seq (n = 87). Measurements and Main Results: Genes increased in both emphysema and IPF relative to control were enriched for the p53/hypoxia pathway, a nding conrmed in an independent cohort using both gene expression arrays and the nCounter Analysis System (n = 193). Immunohistochemistry conrmed overexpression of HIF1A, MDM2, and NFKBIB members of this pathway in tissues from patients with emphysema or IPF. Using reads aligned across splice junctions, we determined that alternative splicing of p53/hypoxia pathwayassociated molecules NUMB and PDGFA occurred more frequently in IPF or emphysema compared with control and validated these ndings by quantitative polymerase chain reaction and the nCounter Analysis System on an independent sample set (n = 193). Finally, by integrating parallel microRNA and mRNA-Seq data on the same samples, we identied MIR96 as a key novel regulatory hub in the p53/hypoxia gene-expression network and conrmed that modulation of MIR96 in vitro recapitulates the disease-associated gene-expression network. Conclusions: Our results suggest convergent transcriptional regulatory hubs in diseases as varied phenotypically as chronic obstructive pulmonary disease and IPF and suggest that these hubs may represent shared key responses of the lung to environmental stresses. Keywords: network; COPD; ILD; IPF; transcriptome Chronic lung diseases affect a signicant portion of the population and account for more than 100,000 deaths a year (1). Although most of these deaths can be attributed to chronic obstructive pulmonary disease (COPD), the major smoking-induced lung disease, more than 15,000 deaths a year can be attributed to idiopathic pulmonary brosis (IPF), a relentless, nearly always fatal brotic lung disease also associated with smoking (2). COPD, dened by the Global Initiative for ( Received in original form October 18, 2015; accepted in final form March 27, 2016 ) *Co–first authors. Co–senior authors. Correspondence and requests for reprints should be addressed to either Avrum Spira, M.D., 72 East Concord Street, E631, Boston, MA 02118. E-mail: [email protected] or to Naftali Kaminski, M.D., 300 Cedar Street, Ste S441D, New Haven, CT 06519. E-mail: [email protected] This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org Am J Respir Crit Care Med Vol 194, Iss 8, pp 948–960, Oct 15, 2016 Copyright © 2016 by the American Thoracic Society Originally Published in Press as DOI: 10.1164/rccm.201510-2026OC on April 22, 2016 Internet address: www.atsjournals.org 948 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Upload: others

Post on 18-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

&get_box_var;ORIGINAL ARTICLE

Integrated Genomics Reveals Convergent Transcriptomic NetworksUnderlying Chronic Obstructive Pulmonary Disease and IdiopathicPulmonary FibrosisRebecca L. Kusko1*, John F. Brothers II1*, John Tedrow2, Kusum Pandit2, Luai Huleihel2, Catalina Perdomo1,Gang Liu1, Brenda Juan-Guardela3, Daniel Kass2, Sherry Zhang1, Marc Lenburg1, Fernando Martinez4,John Quackenbush5, Frank Sciurba2, Andrew Limper6, Mark Geraci7, Ivana Yang7, David A. Schwartz7,Jennifer Beane1, Avrum Spira1‡, and Naftali Kaminski3‡

1Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts; 2Department of Medicine, Universityof Pittsburgh, Pittsburgh, Pennsylvania; 3Pulmonary, Critical Care, and Sleep Medicine, Yale School of Medicine, New Haven,Connecticut; 4Pulmonary & Critical Care Medicine, University of Michigan, Ann Arbor, Michigan; 5Department of Biostatistics,Harvard School of Public Health, Boston, Massachusetts; 6Mayo Clinic, Rochester, Minnesota; and 7Pulmonary Sciences and CriticalCare Medicine, UC Denver, Denver, Colorado

ORCID IDs: 0000-0003-1467-0489 (A.S.); 0000-0001-5917-4601 (N.K.).

Abstract

Rationale: Despite shared environmental exposures, idiopathicpulmonary fibrosis (IPF) and chronic obstructive pulmonarydisease are usually studied in isolation, and the presence of sharedmolecular mechanisms is unknown.

Objectives:Weapplied an integrative genomic approach to identifyconvergent transcriptomic pathways in emphysema and IPF.

Methods:We defined the transcriptional repertoire of chronicobstructive pulmonary disease, IPF, or normal histology lungs usingRNA-seq (n = 87).

Measurements and Main Results: Genes increased in bothemphysema and IPF relative to control were enriched for thep53/hypoxia pathway, a finding confirmed in an independent cohortusing both gene expression arrays and the nCounter Analysis System(n = 193). Immunohistochemistry confirmed overexpression ofHIF1A,MDM2, andNFKBIBmembers of this pathway in tissues from

patients with emphysema or IPF. Using reads aligned across splicejunctions, we determined that alternative splicing of p53/hypoxiapathway–associated molecules NUMB and PDGFA occurred morefrequently in IPF or emphysema compared with control and validatedthese findings by quantitative polymerase chain reaction and thenCounter Analysis System on an independent sample set (n = 193).Finally, by integrating parallel microRNA andmRNA-Seq data on thesame samples, we identifiedMIR96 as a key novel regulatory hub inthe p53/hypoxia gene-expression network and confirmed thatmodulation of MIR96 in vitro recapitulates the disease-associatedgene-expression network.

Conclusions: Our results suggest convergent transcriptionalregulatory hubs in diseases as varied phenotypically as chronicobstructive pulmonary disease and IPF and suggest that these hubsmay represent shared key responses of the lung to environmentalstresses.

Keywords: network; COPD; ILD; IPF; transcriptome

Chronic lung diseases affect a significantportion of the population and accountfor more than 100,000 deaths a year (1).Although most of these deaths can

be attributed to chronic obstructivepulmonary disease (COPD), the majorsmoking-induced lung disease, more than15,000 deaths a year can be attributed to

idiopathic pulmonary fibrosis (IPF), arelentless, nearly always fatal fibrotic lungdisease also associated with smoking (2).COPD, defined by the Global Initiative for

(Received in original form October 18, 2015; accepted in final form March 27, 2016 )

*Co–first authors.‡Co–senior authors.

Correspondence and requests for reprints should be addressed to either Avrum Spira, M.D., 72 East Concord Street, E631, Boston, MA 02118.E-mail: [email protected] or to Naftali Kaminski, M.D., 300 Cedar Street, Ste S441D, New Haven, CT 06519. E-mail: [email protected]

This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org

Am J Respir Crit Care Med Vol 194, Iss 8, pp 948–960, Oct 15, 2016

Copyright © 2016 by the American Thoracic Society

Originally Published in Press as DOI: 10.1164/rccm.201510-2026OC on April 22, 2016

Internet address: www.atsjournals.org

948 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Page 2: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

Chronic Obstructive Lung Disease as a diseasestate characterized by exposure to a noxiousagent (which is almost entirely cigarette smokein the United States) resulting in airflowlimitation that is not fully reversible (3), isthought to result from recruitment ofinflammatory cells in response cigarettesmoke. A subset of patients sustain destructionof lung elastin and other extracellular matrixproteins, death of alveolar cells by apoptosis,or repair failure that leads to airspaceenlargement characteristic of emphysema.Often contrasted with emphysema, IPFis characterized by the findings of usualinterstitial pneumonia, including theinterposition of patchy foci of active fibroblastproliferation associated with minimalinflammation, extracellular matrix deposition,and abnormal alveolar remodeling (2).In recent years there has been significantprogress in understanding of the molecularmechanisms that underlie the lung response to

injury and lead to fibrosis or emphysema inanimal models of disease, but there is still littleunderstanding of the molecular mechanismsthat lead to the abnormal lung phenotype inboth diseases in humans.

The emergence of high throughputtranscript technologies allows investigatorsto glean mechanisms from diseased humantissues. In chronic lung disease, therehave been only a few high-throughputgene expression microarray studies ofCOPD (4–7) or IPF (8–11) and studies ofmiRNA expression in COPD (12, 13) orIPF (14, 15). Despite common risk factors,evidence that parallel pathways may beactivated and suggestions that directcomparisons of the diseases may bemechanistically informative (16), therehas been no effort to determine whetherconvergent molecular pathways could beidentified in IPF and COPD. Thus, inthis study we examined IPF, COPD, andnormal lungs in parallel, aiming toprovide an unbiased description of thetranscriptional repertoire of the lung and itsresponse to chronic injury and to identifyconvergent transcriptional regulatorynetworks in COPD and IPF usingintegrative genomic approaches.

Accordingly, we performed mRNA-Seqand mRNA and miRNA profiling usingmicroarrays on 89 lung tissue samplesrepresenting IPF, COPD, and control fromsubjects obtained through the NHLBI LungTissue Research Consortium (LTRC) as partof the Lung Genomic Research Consortium(LGRC). With the objective of identifyingconvergent disease-associated alterationsin gene expression and splicing, wecharacterized the lung transcriptome andidentifiedmolecular alterations shared by bothchronic lung diseases. The p53/hypoxiapathway was up-regulated in both COPD and

IPF compared with normal histology controls,a finding we validated in an independentcohort of samples and localized to theepithelial layer by immunohistochemistry.In addition, we identified both miRNAs andalterative splicing events related to thep53/hypoxia pathway. Our work providesthe first RNA-seq study of the adult humanlung transcriptional repertoire in health orresponse to chronic injury as representedby COPD and IPF together. It alsorepresents the first time that commongenome scale transcript alterationalterations have been identified betweenIPF and COPD. Our approach (Figure 1)and findings provide insights intoconvergent mechanisms that underlieboth diseases and highlight a relativelyunrecognized central role of thep53/hypoxia pathway in both diseases.

Methods

Patient PopulationLung tissue samples were obtained fromthe NHLBI funded LTRC as previouslydescribed (17). We evaluated the initial 89samples for potential field of cancerizationeffects, because some samples werecollected from areas adjacent to cancer.This evaluation included mining forabnormal gene fusions and cytogeneticabnormalities in the lung and blood allelebalance. Two control samples containedbetween 12% and 25% abnormal cells byallele balance via the Illumina (San Diego,CA) Infinium assay and were thus removedfrom further analysis. Remaining weren = 19 subjects with COPD with predominantemphysema phenotype, n = 17 subjectswith COPD without predominantemphysema phenotype (suspected airways

Supported by National Institutes of Health/NHLBI grants R01-HL095393 and U01HL108642 and Lung Genomics Research Consortium grantRC2-HL101715.

Author Contributions: R.L.K. participated in design of work, analyzed and interpreted data, drafted the work, and revised the work; J.F.B. participated in designof work, analyzed data, interpreted work, drafted the work, and revised the work; J.T. acquired and annotated samples, interpreted results, and revised thework; K.P. led microRNA experiments, performed key experiments, analyzed the work, and revised the work; L.H. performed key validation experiments,analyzed the work, and revised the work; C.P. led experimental validation design and implementation, analyzed the work, and revised the work; G.L. acquiredsamples, led sequencing effort, and revised the work; B.J.-G. generated microarray data, interpreted results, and revised the work; D.K. analyzed thework and revised the work; S.Z. conceived of the work, acquired samples, and revised the work; M.L. conceived of and designed the work, interpreteddata, and revised the work; F.M. participated in study design, acquired samples, and interpreted and revised the work; J.Q. participated in studydesign, interpreted data, and revised the work; F.S. participated in study design, interpreted data, and revised the work; A.L. participated in studydesign, interpreted data, and revised the work; M.G. contributed to data interpretation, study design, and manuscript generation; I.Y. contributed to datainterpretation, study design, and manuscript generation; D.A.S. was involved in all stages of work from initial conceptualization, study design, data generationand interpretation, experimental validation, and manuscript generation; J.B. led the sequencing effort, conceived of and designed the work, interpreted data,and revised the work; A.S. led all stages of the work from initial conceptualization, study design, data generation and interpretation, and manuscriptgeneration; and N.K. led all stages of work from initial conceptualization, study design, data generation and interpretation, experimental validation, andmanuscript generation.

At a Glance Commentary

Scientific Knowledge on theSubject: Although idiopathicpulmonary fibrosis and chronicobstructive pulmonary disease sharerisk factors, such as cigarette smoking,overlapping disease pathogenesismechanisms have yet to be described.

What This Study Adds to theField: We show that chronicobstructive pulmonary disease andidiopathic pulmonary fibrosis sharetranscriptional mRNA–microRNAhypoxia/p53 regulation and alternativesplicing of p53/hypoxia-associatedgenes NUMB and PDGFA.

ORIGINAL ARTICLE

Kusko, Brothers, Tedrow, et al.: Transcriptomic Networks Underlying COPD and IPF 949

Page 3: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

disease), n = 19 subjects with IPF, n = 13subjects with COPD with intermediateemphysema phenotype, and n = 20histologically normal tissue samples.

Seventy-five of the remaining 87samples were selected by pathologists asdisplaying the most distinct phenotypes(Table 1). The COPD categories weredefined based on the percent emphysema:samples that had less than 10% emphysemaand greater than 30% emphysema wereused to define the COPD airway andemphysema phenotypes, respectively.The interstitial lung disease (ILD) sampleswere subset down to those with “IPF”phenotype as per American ThoracicSociety criteria (18).

RNA ExtractionWe extracted total RNA from all lungsamples using the QIAcube system(QIAGEN Inc., Valencia, CA) with themiRNeasy kit. RNA quality was determinedusing a Bioanalyzer 2100 (AgilentTechnologies, Santa Clara, CA) with aRNA integrity number greater than 7.0 asthe criterion for acceptable quality.

mRNA-SeqLibrary preparation and mRNA sequencingwas performed on each of the 89 LGRCsamples. The library preparation wasdone using Illumina’s mRNA SequencingSample Preparation kit starting with 1 mg

of total RNA (see the METHODS section inthe online supplement for additionalinformation).

mRNA and miRNA MicroarrayProcessingWe hybridized RNA from all LGRC samples(Table 2) to Agilent V2 Human WholeGenome microarrays and then quantilenormalized the data using GeneSpring(see the METHODS section in the onlinesupplement for additional information).

Initial Processing of mRNA-Seq DataTo perform initial data quality control (QC),we used standard Illumina metrics, our owncustom perl scripts, and FastQC (BabrahamBioinformatics, Babraham, UK). We alignedsamples that passed theQC filter to hg19 usingTophat and completed an additional QC stepinvolving review of standard alignmentmetrics. We generated gene level expressionestimates using Cufflinks (19).

Gene FilteringFirst we log2 transformed FPKM geneexpression data from Cufflinks. Using their“on” or “off” status and coefficient ofvariance, we filtered genes. To determinegene status we used a modified version ofthe mixture model in the SCAN.UPCBioconductor package (see the METHODS

section in the online supplement) (20).For a gene to be included in differentialexpression analysis, it had to be classifiedas “on” in at least 25% of samples out of thetwo phenotypic groups being compared.After this, the bottom 20% of genes werefiltered out based on their coefficient ofvariation.

Differential ExpressionWe identified differentially expressed geneswith the limma package. For emphysema,we included only samples with greaterthan 30% emphysema. Out of the ILDpopulation, we included only samples withpure IPF. We included only genes annotatedas “known” in Ensembl. Overall we chosesamples such that age, pack-years, smokingstatus, and sex were not confoundingbetween the groups being compared.Functional enrichment was tested withGene Set Enrichment Analysis (GSEA).

Comparing mRNA Arrays withmRNA-SeqUsing a t test in limma (same as for RNAseqanalysis) we identified disease-associated

10. Hypothesis Testing

In vitro miRNA network perturbations

9. Integrative Transcriptomics

mRNA/miRNA** network analysis

Expression Assays: mRNA Arrays, Nanostring, andPCR Localization Assays: Tissue IHC

8. Pathway Validation

* LTRC: Lung Tissue Research Consortium

** microRNA expression data generated using Aligent microRNA microarrays

6a.DifferentialExpression

6b.AlternativeSplicing

5. Defining the Transcriptome:Transcript Expressed?

Yes No

7. Pathway Analysis

Analysis of RNA-seq Data

Removedfrom further

analysis

Sample Collection and Initial Processing

1. Samples collected from LTRC*2. RNA Isolations

3. mRNA Library Prep4. Illumina RNA-seq (75 nt paired end)

Figure 1. Overview of analysis. IHC = immunohistochemistry; LTRC= Lung Tissue ResearchConsortium; miRNA =microRNA; PCR = polymerase chain reaction.

ORIGINAL ARTICLE

950 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Page 4: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

changes in gene expression. Genes inEnsembl without Agilent probes mappingto them were excluded from analyses.Platforms were compared by evaluatingthe overlap between genes identified asdifferentially expressed. Correlationbetween t-statistics on the two platformswas found using a Pearson correlation.

ImmunohistochemistryWe acquired formalin-fixed paraffin-embedded tissues from the LTRC. Aspreviously described (21), we performedimmunohistochemistry using mousemonoclonal antibodies directed againstMDM2 (Millipore, Temecula, CA), HIF1A(Stressgen, Victoria, BC, Canada), andNFKBIB (ABD Serotec, Raleigh, NC),and a rabbit polyclonal antibodydirected against PDGFA (Santa CruzBiotechnology, Santa Cruz, CA). We tookall brightfield images with an Olympus(Billerica, MA) DP25 camera on anOlympus CH2 microscope (21).

Integrating miRNA and mRNA DataWe integrated miRNA array andmRNA-Seq data with MirConnX (22).This tool combines a prior, static networkcreated from miRNA binding predictionsand literature validation with usersubmitted data to create a transcriptomicgene regulatory network. For eachcondition-control comparison, we filteredto only differentially expressed mRNAs. Weinspected resulting regulatory networks forpotential regulatory hubs (miRNAs with ahigh number of connected mRNAs).

Identification of Alternative SplicingEventsFirst, we ran Tophat (19), allowing onlyknown junctions. Using the junction.bedoutput file, we collected the number ofreads that span splice junctions. We ran alinear model with an interaction term,using the RPM of reads spanning eachsplice junction (see online supplement formodel). We examined switching behaviorand the structure of the significant splice

junctions to identify alternative splicingbetween the disease and control samples.

Data AccessGenomic data (RNA-seq and AgilentmRNA and miRNA expression arrays)and associated clinical data are availablefor download on the LGRC website(https://www.lung-genomics.org/research/);miRNA regulatory networks are available athttp://mirconnx.csb.pitt.edu/job_results?job_id=example10103, http://mirconnx.csb.pitt.edu/job_results?job_id=example10102,and http://mirconnx.csb.pitt.edu/job_results?job_id=example10101.

Results

Study Population from the LTRCWe acquired all lung samples from theLTRC, which were used by the LGRC,both supported by the NHLBI. Patientclinical information, pulmonary functions,demographics, imaging results, pathology,and clinical diagnoses were available.Tables 1–3 provide the demographicsand clinical characteristics of all researchsubjects included in the different phasesof the study (Figure 1). We used 87 samplesto analyze the global lung transcriptionalrepertoire (Table 2). To perform analysis ofconvergent pathways, we used 75 of the87 samples that exhibited relatively distinctphenotypes of emphysema, COPD withoutemphysema, IPF, or normal histologycontrols (Table 1). The patient sampleswere matched for age, smoking history, andsex. Institutional review boards approvedall studies at participating institutions andper LTRC protocol all patients signedinformed consent. All of the transcriptomicresults, and associated clinical data, areavailable for download on the LGRC

Table 1. Demographics of the 75 Lung Tissue Samples Used for Analysis of Differential Expression and Splicing (Figures 3 and 4)

IPFEmphysema

(>30% Emphysema)Airway COPD

(<10% Emphysema) Control

Samples, n 19 19 17 20Age 64.06 9.2 56.36 8.7 68.46 10.4 62.36 10.0Sex 15 M, 4 F 10 M, 9 F 12 M, 5 F 11 M, 9 FRace 19 white 18 white, 1 African American 17 white 18 white, 2 African AmericanSmoking status 17 former, 2 never 18 former, 1 never 15 former (2) 1 current, 15 former, 2 never (2)Pack-years 316 23 (2) 486 27 (1) 516 23 (2) 276 23 (4)% Emphysema 1.76 2.6 (7) 47.56 9.1 2.36 2.0 0.86 1.3 (9)

Definition of abbreviations: COPD = chronic obstructive pulmonary disease; IPF = idiopathic pulmonary fibrosis.These samples are a subset of Table 2. Values in parentheses are missing demographics.

Table 2. Demographics of the 87 Lung Tissue Samples Used to Define TranscriptomicLandscape (Figure 2)

ILD/IPF COPD Control

Samples, n 23 43 21Age 63.56 8.7 (1) 62.06 11.2 (7) 62.06 9.8Sex 17 M, 5 F (1) 22 M, 14 F (7) 12 M, 9 FRace 22 white (1) 35 white, 1 African

American (7)19 white, 2 AfricanAmerican

Smoking status 19 former,3 never (1)

33 former,1 never (9)

1 current, 15 former,3 never (2)

Pack-years 28.76 22.5 (3) 49.26 25.3 (10)* 27.36 22.6 (5)% Emphysema 1.76 2.6 (11) 24.96 22.2* 1.16 1.5 (4)

Definition of abbreviations: COPD = chronic obstructive pulmonary disease; ILD = interstitial lungdisease; IPF = idiopathic pulmonary fibrosis.Values in parentheses are missing demographics.*Significant with P, 0.05 compared with control.

ORIGINAL ARTICLE

Kusko, Brothers, Tedrow, et al.: Transcriptomic Networks Underlying COPD and IPF 951

Page 5: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

website (https://www.lung-genomics.org/research/).

RNA-Sequencing Data Alignment tothe Human GenomeEach sample yielded a range of 18–46million (average, 32 million) 75-ntpaired-end reads, and a high percentage(average, 28 million) of these reads alignedto the genome using conservative alignmentparameters. Specifically, 85.96 6.9% ofreads aligned to the genome, and81.46 3.1% aligned uniquely. Of thealigned reads, 90.36 4.8% were aligned aspaired ends (of which 88.76 3.8% wereproperly paired), and 9.046 4.8% werealigned as singletons. The alignmentstatistics across the samples indicate that theRNA-seq data obtained from the LTRCtissue samples were of high quality (seeFigures E1 and E2 in the online supplement).

Identifying the Core LungTranscriptome in Health and DiseaseTo identify genes reliably expressed acrossall lung tissues, we developed a methodto distinguish between genes with clearexpression signal (called “signal” or“reliably expressed”) and genes with zeroor so few aligned reads that they could notbe distinguished from either statisticalor biologic noise (possibly occurringbecause of leaky transcription) (23) or thecombination of transcription rate andmRNA degradation in the cells (24). Weperformed this analysis on each of the87 samples (Table 2) and each sampleshowed a similar number of genes in thethree categories (signal, noise, zero)(Figure 2A). Next, we examined each geneto see if it was “always reliably expressed,”“never reliably expressed,” or “variablyexpressed” across all 87 samples tocharacterize the landscape of expression

across the lung transcriptome (Figure 2B). Ofthe 24,297 experimentally known Ensembl59genes, 7,767 were constitutively expressed(“reliably expressed” across the 87 samples),8,756 were never reliably expressed, and 7,774were variably expressed.

To better understand thetranscriptional repertoire of the lung, welooked at which genes were always reliablyexpressed in each disease subtype (control,n = 20; COPD with emphysema, n = 19;IPF, n = 19; airway COPD, n = 17) (Table 1)and the overlap across all 87 samples(Table 2). Most of the genes (77.9%) thatwere always reliably expressed in each ofthose subsets overlapped (Figure 2C)and those can be considered the corelung transcriptome. We performed geneontology analysis on the lists of genesidentified as always expressed across allsamples and within each phenotype, whichshowed that most (76%) of the KEGGpathways (25) that were significantlyenriched in at least one group (q, 0.05)were shared across all five groups(Figure 2D). The data suggest that thereis a core lung transcriptome that isreliably expressed in the lung regardlessof disease subtypes.

Differential Expression of Genes inIPF and Emphysema RevealsOverexpression of the p53/HypoxiaPathwayStarting with 10,512 and 10,267 filtered genesfor IPF and emphysema, respectively(intersecting genes, 9,472; union ofgenes, 11,307), we identified 2,490 genessignificantly differentially expressed betweenIPF and control subjects, and 337 genesbetween emphysema and control subjects(P, 0.005; 55 and 53 genes expected bychance for IPF or emphysema, respectively).We validated the RNAseq results using gene

expression microarrays run on the same 87samples (Table 2). The t-statistics weresignificantly correlated between RNAseq andgene expression microarrays (emphysemavs. control, R = 0.75, P, 0.001; IPF vs.control, r = 0.83, P, 0.001)

Importantly, the genes thatdistinguished IPF or emphysema fromnormal histology controls were changedin concordant directions, even if not alwaysat the same magnitude (Figure 3A). Theoverlap of the differentially expressed genescommon to emphysema and IPF wassignificantly higher than what wouldbe expected by chance (214 genes,hypergeometric P, 3.8e262). The 214genes are available in Table E1. GSEA ofIPF and emphysema differential expressionrevealed shared pathways including theKEGG p53 pathway (emphysema vs.control, P = 0.003; IPF vs. control,P = 0.003), Biocarta p53/hypoxia pathway(emphysema vs. control, P = 0.01; IPF vs.control, P = 0.026), and other biologicprocesses outlined in Table E2.

Analysis of a nonoverlapping cohortof lung tissue samples from the LTRC(Table 3) by gene expression microarraysconfirmed that up-regulation of thep53/hypoxia pathway characterized thegenes that distinguished emphysema or IPFfrom normal histology controls. GSEA (26)corroborated significant enrichment ofthe KEGG p53 and Biocarta p53/hypoxialeading edge from GSEA of the initialanalysis set among genes up-regulated inemphysema or IPF tissues compared withnormal histology controls (P, 0.001)(see Figure E3). Because cell typedifferences were a concern, we usedimmunohistochemistry to confirm thelocation of select differentially expressedgenes in the p53 pathway in control,emphysema, and IPF samples (n = 5 for each).

Table 3. Demographics of Independent Cohort Used for Validation

ILD/IPF COPD Control

Samples, n 77 34 82Age 64.46 8.7 60.66 9.5 63.86 11.9Sex 54 M, 23 F 15 M, 19 F 35 M, 47 FRace 69 white, 2 African American, 2 Asian,

1 other (3)33 white, 1 African American 76 white, 1 Hispanic, 1 African American,

3 Asian, 1 otherSmoking status 2 current, 42 former, 29 never (4) 2 current, 32 former 1 current, 43 former, 29 never (9)Pack-years 246 18 (33) 516 27 376 32 (38)% Emphysema 0.96 1.6 (63) 36.66 9.9 (21) 0.66 0.9 (71)

Definition of abbreviations: COPD = chronic obstructive pulmonary disease; ILD = interstitial lung disease; IPF = idiopathic pulmonary fibrosis.Values in parentheses are missing demographics.

ORIGINAL ARTICLE

952 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Page 6: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

q ≥ 0.05q < 0.05

All(n = 87)

Control(n = 20)

IPF(n = 19)

SevereEmp

(n = 19)Minimal

Emp(n = 17)

KEGG Pathway Enrichment inGenes Always Expressed in thefollowing categories:

D

78

45

137244

248

270

102

107

62

190

533

7826

209

179

56

C

Emphysema(n=19)

Control(n=20) IPF

(n=19)

AirwayCOPD(n=17)

B

1.0

8,756 genes wereexpressed in

none ofthe samples

7,767 genes wereexpressed in

all ofthe samples

7,774 geneswere expressed

in some ofthe samples

0.8

0.6

0.4

0.2

0.0

0 5000 10000 15000 20000 25000

Genes ordered by percent of samples they are expressed in

Per

cent

of s

ampl

es (

n =

87)

Num

ber

of g

enes

12000

A

10000

8000

6000

4000

2000

0

signal noise zero

Gly

coly

sis

/ Glu

cone

ogen

esis

Citr

ate

cycl

e (T

CA

cyc

le)

Pen

tose

pho

spha

teF

ruct

ose

and

man

nose

met

abol

ism

Gal

acto

se m

etab

olis

mF

atty

aci

d m

etab

olis

mO

xida

tive

phos

phor

ylat

ion

Pur

ine

met

abol

ism

Pyr

imid

ine

met

abol

ism

Val

ine,

leuc

ine

and

isol

euci

ne d

egra

datio

n

Lysi

ne d

egra

datio

n

Arg

inin

e an

d pr

olin

e m

etab

olis

m

Sele

noam

ino

acid

met

abol

ism

Glu

tath

ione

met

abol

ism

N-G

lyca

n bi

osyn

thes

is

Amin

o su

gar a

nd n

ucle

otid

e su

gar m

etab

olis

m

Chond

roitin

sul

fate

bio

synt

hesis

Kerat

an s

ulfa

te b

iosy

nthe

sis

Inos

itol p

hosp

hate

met

abol

ism

Glycos

ylpho

spha

tidyli

nosit

ol(GPI)-

anch

or b

iosyn

thes

is

Pyruv

ate

met

aboli

sm

Glyoxy

late a

nd di

carb

oxyla

te meta

bolis

m

Propanoate m

etabolism

Terpenoid backbone biosy

nthesis

Aminoacyl-tR

NA biosynthesis

Biosynthesis

of unsa

turated fatty

acids

Ribosome

RNA degradation

RNA polymerase

DNA replication

Spliceosome

Proteasome

Protein export

Base excision repair

Nucleotide excision repair

MAPK signaling

ErbB signaling

Chemokine signaling

Phosphatidylinositol signalingCell cycle

Oocyte meiosisp53 signalingUbiquitin mediated proteolysis

SNARE interactions in vesicular transport

LysosomeEndocytosismTOR signaling

ApoptosisVascular smooth muscle

Wnt signaling

Notch signaling

TGF-beta signaling

Axon guidance

VEGF signaling

Focal adhesion

Adherens junction

Tight junction

Gap junction

Toll-like receptor signaling

NO

D-like receptor signaling

RIG

-I-like receptor signaling

T cell receptor signaling

B cell receptor signaling

Fc

gam

ma

R-m

edia

ted

phag

ocyt

osis

Leuk

ocyt

e tr

anse

ndot

helia

l mig

ratio

n

Long

-ter

m p

oten

tiatio

n

Reg

ulat

ion

of a

ctin

cyt

oske

leto

n

Prog

este

rone

-med

iate

d oo

cyte

mat

urat

ion

Vibrio ch

olerae infecti

on

Epithelia

l cell s

ignaling in

Helico

bacter p

ylori i

nfection

Pathogenic Esch

erichia co

li infectio

nPathways in cancerColorectal cancer

Renal cell carcinomaPancreatic cancer

Endometrial cancerGliomaProstate cancerThyroid cancer

Bladder cancerChronic myeloid leukemia

Acute myeloid leukemia

Small cell lung cancer

Non-small cell lung cancer

Cysteine and methionine metabolism

beta-Alanine metabolism

Basal transcription factors

PPAR signaling pathway

Dorso-ventral axis formation

Melanogenesis

Prion diseases

Glycosaminoglycan degradation

Butanoate metabolism

Methane metabolism

Synthesis and degradation of ketone bodies

One carbon pool by folate

ECM-receptor interaction

Aldosterone-regulated sodium reabsorptionM

elanoma

Steroid biosynthesis

Valine, leucine and isoleucine biosynthesis

Histidine m

etabolism

Tryptophan metabolism

Other glycan degradation

Thiamine m

etabolismM

ismatch repair

Regulation of autophagy

Cardiac m

uscle contractionN

atural killer cell mediated cytotoxicity

Am

yotrophic lateral sclerosis (ALS

)

Jak-STAT signaling

Cytosolic DNA-sensing

Huntin

gton's

Parkin

son's

Alzh

eim

er's

Adip

ocyt

okin

e si

gnal

ing

GnR

H s

igna

ling

Insu

lin s

igna

ling

Neu

rotr

ophi

n si

gnal

ing

Fc

epsi

lon

RI s

igna

ling

Figure 2. Overview of the transcriptomic landscape of Lung Genomic Research Consortium lung tissue samples. (A) Box plot of the number of genescharacterized as real signal, missing, or noise in all sequenced samples (Table 2). (B) Distribution of genes present in all 87 samples (core lung transcriptome),those never expressed, and those that are variably expressed. (C) Venn diagram of genes always present within the unique lung condition signatures (Table 1).(D) Circos plot highlighting the overlap of functional enrichment of genes always expressed in the core lung transcriptome and across the unique lung conditions.COPD= chronic obstructive pulmonary disease; EMP= emphysema; IPF = idiopathic pulmonary fibrosis; KEGG=Kyoto Encyclopedia for Genes and Genomes.

ORIGINAL ARTICLE

Kusko, Brothers, Tedrow, et al.: Transcriptomic Networks Underlying COPD and IPF 953

Page 7: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

The changes in expression of HIF1A, MDM2,and NFKBIB were all confirmed byimmunohistochemistry, with MDM2localizing to the airway epithelium (Figure 3C).

Overlapping Differential Splicing inEmphysema and IPF Identifiesp53-associated NUMB and PDGFAWe have leveraged the RNA-seq data(Table 1) to identify disease-associatedalternative splicing events using aninteraction term linear model (see METHODS

and online supplement). Our analysis bestselected genes with cassette exon skippingor inclusion events (27). Differentialsplicing analysis revealed one gene inemphysema versus control samples and27 genes in IPF versus control samples thathad significant changes associated withboth splice junctions and disease (q, 0.25)compared with controls (see Table E3).

Given that the heterogeneity ofdisease tissue samples and the large numberof splice junctions could leave usunderpowered to detect significantalternative splicing, we decided to relax thesignificance value (P, 0.01) to identify aset of alternative splicing events shared inboth diseases, because these were less likelyto be affected by a change in cell typecontent, given the histologic/pathologicdifferences between IPF and COPD. Toidentify disease-associated splicing events,we focused on genes with splice junctionsthat showed a switching in expressionassociated with disease (i.e., at least onesplice junction was down-regulated withdisease and one splice junction wasup-regulated with disease). Two genes,PDGFA and NUMB (Figures 4A and 4D),showed significant concordant changes inisoform proportions in both emphysemaand IPF samples compared with control.PDGFA and NUMB are likely associatedwith the p53/hypoxia pathway (seeDISCUSSION and supplemental RESULTS).

Analysis of PDFGA showed exclusionof an exon in both IPF and emphysemasamples compared with normal histologycontrols. Specifically, whereas allsamples express both isoforms ofPDGFA, expression of one splicejunction supporting PDGFA isoform 001increased and expression two splicejunctions supporting PDGFA isoform 002were decreased in the disease samples(Figure 4B). Oppositely NUMBdemonstrated preferential inclusion of anexon among both IPF and emphysema

samples compared with control. There wasa difference in ratios of isoform 203 andisoform 202 (contains extra exon 9)observed by changes in expression of onesplice junction versus two splice junctions,respectively (Figure 4D). Although theNUMB coverage plots was more subtle,

there is a significant change (P, 0.01) inthe proportions of these two isoformsreflected in the ratio of reads mapping tothe genomic regions spanning exon 8 andexon 9 in Figure 4E.

We validated these splicing eventsusing the nCounter gene expression analysis

B

C

10

A

5

0

EM

P/C

TR

L

–5

–10

–10 –5 0IPF/CTRL

5 10

Emphysema

Emphysema

IPF

IPF

Control

Control

PD

GF

-AN

FκB

iBH

IF-1α

MD

M2

Figure 3. Convergent gene expression patterns in idiopathic pulmonary fibrosis (IPF) and emphysema.(A) Scatter plot of fold ratios of emphysema to control (y-axis) versus fold ratio of IPF to control (x-axis).Fold ratios for all genes are depicted in log base 2. Spots are colored in increasing shades of purplebased on concurrent directions and yellow if going to the opposite direction. (B) Heatmap of significantshared changes in gene expression in emphysema and IPF. Red indicates relative higher expression,blue means relative lower expression. (C) Immunohistochemistry for NFkBiB, HIF1a, and MDM2 (n = 5per group). For all three targets, greater staining intensity was observed in IPF and emphysema samplescompared with controls. For NFkBiB and HIF1a, there is no clear localization to specific lung celllineages in contrast to MDM2, which seems to be localized to epithelial cells and macrophages. Insetimages show staining with nonimmune control IgG (magnification3100; scale bar=100 mm). Black arrowsindicate epithelial cells, yellow arrows indicate macrophages, and pink arrows indicate cells in lymphoidaggregates. CTRL= control; EMP= emphysema; HIF-1a =hypoxia-inducible factor 1-a; IPF = idiopathicpulmonary fibrosis; MDM2=E3 ubiquitin-protein ligase; PDGF-A=platelet-derived growth factor a.

ORIGINAL ARTICLE

954 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Page 8: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

system (28, 29) across the same cohort ofsamples with RNA-Seq data (subset of52 samples with leftover mRNA) and anindependent replication cohort of samplesfrom the LTRC (n = 193) (Table 3). Theresults on the same subset of samples(n = 52) corroborated that PDGFA isoform001 was up-regulated in emphysema versuscontrol samples and unchanged in IPFversus control samples. In validation,PDGFA isoform 002 was down-regulated inboth IPF and emphysema samples versuscontrol. This verifies that the isoformproportions of PDGFA are significantlydifferent between disease and controlsamples in the original cohort (Figure 4C).

The nCounter results across independentcohort of 193 lung tissue samplesconfirmed that PDGFA is differentiallyspliced in a more general set of COPDsamples compared with control (P, 0.05).However, when moving to a moregeneral set of ILD samples (not limitedto IPF samples), PDGFA seems to besignificantly differentially expressedrather than differentially spliced, suggestingthat the differential splicing is likely tobe more specific to IPF samples thanto the ILD samples (see Figure E4).

For NUMB, the nCounter expressionanalysis on the same subset (n = 52) showedthat NUMB isoform 202 was significantly

up-regulated in both IPF and emphysemasamples versus control (P, 0.05). It alsoconfirmed that NUMB isoform 203 wassignificantly down-regulated in IPF versuscontrol samples (P, 0.05) and notsignificantly changed in COPD withemphysema versus control samples(Figure 4F). This confirmed our RNA-seqresults, which showed that isoformproportions of NUMB were significantlydifferent between chronic lung diseases andcontrol samples (Figure 4E). The nCountervalidation on the independent cohort alsoconfirmed that the NUMB isoform 202 wassignificantly up-regulated across a moregeneral set of ILD versus control samples,

Control Emphysema IPF

D

B C

Iog2

(R

PM

)Io

g2 (

RP

M)

Iog2

(R

PM

)

3.0

3000

Nanostring Results(n=52)

*p < 0.05*

2500

2000

1500

1000

500

Isoform 001

1600140012001000800600400200

0

**

Isoform 002

0

2.0

1.0

0.0

3.0

2.0

1.0

0.0

3.0

2.0

1.0

0.0

E F

Iog2

(R

PM

)Io

g2 (

RP

M)

Iog2

(R

PM

)

350030002500200015001000500

0

*

Nanostring Results(n=52)

p < 0.05*

Isoform 203

1000900800700600500400300200100

0

**

Isoform 202

543210543210543210

A

Isoform 203

4.0

Iog2

(R

PM

+1)

3.02.0

0.01.0

4.03.02.01.00.0

Isoform 202Isoform 001

4.0

Iog2

(R

PM

+1)

3.0

2.0

1.0

4.0

3.0

2.0

1.0

Isoform 002

Figure 4. PDGFA and NUMB are differentially spliced in chronic lung disease. The structure of the two isoforms for both PDGFA and NUMB are shownat the top of A and D, respectively. (A) Box plots of PDGFA isoform 001 and 001 splice junctions. (B) Coverage plots showing the interquartile range(lighter shading) of normalized reads and the mean (darker line) for PDGFA. (C) Barplot of nanostring validation showing the mean and the SE of thenormalized number of times a transcript was counted by the nanostring method. (D) Box plots of the splice junctions in NUMB. (E) Coverage plots ofNUMB. (F) Barplot of NUMB alternative splicing nanostring validation. IPF = idiopathic pulmonary fibrosis; RPM= reads per million.

ORIGINAL ARTICLE

Kusko, Brothers, Tedrow, et al.: Transcriptomic Networks Underlying COPD and IPF 955

Page 9: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

whereas the NUMB isoform 203 wassignificantly down-regulated in ILDsamples (see Figure E4).

Identification and Validation ofMicroRNAs That RegulateTranscriptional Networks Shared inCOPD and IPFIntegrating mRNA-Seq and miRNAmicroarray expression data on the samesamples (Table 1) uncovered additionalinsights into the transcriptomic regulationof the p53/hypoxia pathway in emphysemaand IPF. Using miRconnX (22), we createda data-driven and prior knowledge basedgene/miRNA regulatory network. Initially,we constructed a regulatory network using

genes differentially expressed (P, 0.05) inthe same direction in both emphysemaand IPF to explore shared regulatorymechanisms between the two diseases(http://mirconnx.csb.pitt.edu/job_results?job_id=example10103). The networkcontains 15 miRNA including MIR96and 31 genes. We created two additionalnetworks by submitting emphysema versuscontrol genes (http://mirconnx.csb.pitt.edu/job_results?job_id=example10102) andIPF versus control genes (Figure 5, orhttp://mirconnx.csb.pitt.edu/job_results?job_id=example10101) with the sameP value cutoff. Both of these networksfeatured MIR96 as the most connectedmiRNA.

Because glutamate transporter SCL1A1and BTK inhibitor SH3BP5 were down-regulated in both diseases (see Table E1)and predicted to be repressed by MIR96 inthe shared regulatory network generatedusing mirConnX we tested the effect ofoverexpression of MIR96 on their geneexpression levels in vitro. Overexpression ofMIR96 in primary lung fibroblasts andepithelial cells significantly repressed theexpression of both genes (see Figure E5).To test whether up-regulation of MIR96induced global changes in gene expressionsimilar to those observed in the lung weran gene expression arrays on the samesamples. GSEA analysis revealed thatoverexpression of MIR96 recapitulated

SLC1A1

mir-96

TNS3 SH3BP5GPM6A

CACNA2D2

ARHGAP24

TSC22D3

EFNB2

PHLDB1

= repression (bold = PCR validated)= miRNA up in IPF= miRNA down in IPF= IPF mRNA= p53 associated IPF mRNA= IPF and Emp mRNA= p53 associated IPF and EMP mRNA

GPC3TSPAN9

ANKRD27

SLC10A3

LDB2

TLN1FYN

Figure 5. Shared emphysema (EMP) and idiopathic pulmonary fibrosis (IPF) microRNA (miRNA) regulatory network. Regulatory miRNA–mRNA networkshowing regulation in both diseases. Red lines indicate direction of repression. Bold red lines indicate interactions that were selected and validated bypolymerase chain reaction (PCR).

ORIGINAL ARTICLE

956 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Page 10: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

in vitro some of the global gene expressionchanges observed in IPF lungs relative tocontrols (see Figure E6; P = 0.008).

Discussion

In this study we provide the most in depthprofile of the core lung transcriptome inhealth and disease to date using nextgeneration sequencing. By comparing thetwo radically different but similarlydevastating lung diseases, emphysema andIPF, with healthy, normal histology lungs wediscover that that activation of p53/hypoxiapathway is common to both diseases.Despite their extremely divergentphenotypes, IPF and emphysema sharethe same environmental risk factors andespecially exposure to cigarette smoke. Thisunexpected and novel finding may shedsignificant light on the potential corepathways that initiate lung chronic lungremodeling in response to environmentalinjury. Taking advantage of the full depth ofour transcriptomic analyses and large wellcharacterized cohort we identified andvalidated alternative splicing of p53/hypoxiapathway associated molecules NUMBand a role for MIR96 as a regulatorof the p53/hypoxia lung response toenvironmental injury gene-expressionnetwork.

Given the depth of RNA sequencingcoverage across a relatively large and diverseset of lung tissues at different stages ofhealth and disease, our study provides aview of the core transcriptome of the lungand the wide range of its transcriptionalpotential. Using a mixed modelingapproach, we classified genes as eitherhaving no detected expression by RNA-seq,reliably expressed, and detected but likely tobe transcriptional noise rather than activeexpression. Generally, within eachsample, approximately half of the knowntranscriptome is reliably expressed as shownby Figure 2A. We then classified thetranscriptome by counting the number ofsamples a gene was reliably expressed in:approximately one-third of genes werereliably expressed in all 87 samples, one-third were never reliably expressed in anyof 87 samples, and one-third were variablyexpressed in a subset of the 87 samples asseen in Figure 2B. This relatively largetranscriptional plasticity likely reflects boththe diversity of cell types in the lung, andthe ability of individual cells types to

dramatically change their transcriptionalprofiles, both important characteristics inan organ continuously and unpredictablyexposed to a wide array of inhaledenvironmental exposures.

Aside from its’ novelty the importanceof our description of the lung transcriptomelies in its comprehensive nature. In recentyears there have been multiple efforts totarget drugs to distinct organ systems.Many of these efforts failed because most ofthe information about organ specificity ofgene expression is historical, based on asmall number of samples and usuallycontaining only normal tissues or onedisease state. In contrast our dataset is wide,contains a diverse compendium of lungtissues in ranges from normal histology tofulminant disease, and thus should serveas a unique resource to those seeking tounderstand lung transcriptional networks,or more importantly, develop lung-specificinterventions.

Our initial analysis aimed at identifyingthe common molecular networks inemphysema and IPF was conducted atthe gene-expression level. One of thestriking findings from that analysis was therelatively large number of genes that weredifferentially expressed between IPF andnormal lung, as compared with the numberof genes that are differentially expressedbetween emphysema and normal lung.These findings could potentially stemfrom the distinct cellular changes thatcharacterize fibrotic foci that were profiledas compared with a more heterogeneouscell type composition in emphysema.However, despite the contrasting differencesin cellular content and lung structurebetween IPF and emphysema, we identifieda shared molecular network related to theup-regulation of the p53/hypoxia pathwayin both conditions. When the p53 pathwayis triggered by hypoxia instead of DNAdamage, apoptosis is not triggered. Ourresults confirm previous studies that foundup-regulation of specific members of thep53/hypoxia pathway when studyingthe diseases separately. Up-regulation ofHIF1A, TP53, MDM2, CDKN1A, andBAX were described in IPF (30, 31) andup-regulation of TP53 and BAX weredescribed in emphysema (32). However,these studies did not compare emphysemaand IPF together with control and thusdid not highlight the fact that this pathwaymay serve as a core lung response toenvironmental injury. Similarly, our

analysis uncovered additional transcriptionalregulatory dimensions of this pathwaywere dysregulated in both diseasesincluding alternative splicing andmicroRNA regulation.

Considering the commonality of thefindings between emphysema and IPF, it istempting to hypothesize that these findingsrepresent indeed a core pulmonary injuryresponse, but further studies are neededto evaluate whether this dysregulation isin fact causal. The mechanisms drivingP53/hypoxia pathways in both diseases areunclear. It is possible that this is simply aresponse to the impaired gas exchangetypical for both diseases. However, it couldalso represent a response to injury as hasbeen preciously suggested. Regardless,chronic activation of these pathways hasimportant downstream cellular andinflammatory effects that may sustainthe phenotypic changes observed in thediseases. Detailed mechanistic studies arerequired to answer this question, and toidentify what are the points of divergence forboth diseases.

Another discovery is the identificationof IPF- and emphysema-related splicingevents. Analysis of RNAseq reads thatoverlap transcript splice junctions allowedus to identify disease-associated changesin alternative splicing. As with differentialgene expression, there were moredifferentially spliced genes in IPF versuscontrol lung as compared with emphysemaversus control lung, which may again likelyreflect diversity of cell types. PDGFA andp53-associated NUMB were differentiallyspliced in both emphysema and IPF tissuescompared with normal histology controls,with the change in isoform expression inthe same direction. NUMB has fourprimary isoforms that occur from thealternative splicing of two regions: aphosphotyrosine-binding domain anda proline-rich region (PRR), which is aSH3-binding domain. In IPF andemphysema, we saw evidence for a changein the alternative splicing of the PRR basedon the splice junctions that distinguishbetween the isoforms that contain a 144-nt(48 amino acid) insert in the PRR and theisoforms that do not include that insert inthe PRR. In both diseases, the NUMBisoform with the longer PRR was increasedand the NUMB isoform with the shorterPRR was decreased. This result is ofparticular interest because of its relationto the p53 pathway. NUMB is known to

ORIGINAL ARTICLE

Kusko, Brothers, Tedrow, et al.: Transcriptomic Networks Underlying COPD and IPF 957

Page 11: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

bind to both TP53 and MDM2 separatelyand may potentially form a triplex withTP53 and MDM2, stabilizing andpreventing TP53 from being degraded(33–35). Further experiments are requiredto determine whether the observed splicingevent affects the binding of NUMB to TP53and MDM2.

PDGFA has two primary transcriptproducts (Ensembl isoforms PDGFA-001and PDGFA-002) that are 196 and 211amino acids long, respectively (36, 37). Thelonger (PDGFA-002) version that containsexon 6 is decreased in IPF and emphysema,whereas the shorter (PDGFA-001) does notchange. The function of this extra exon iswell described. It is transcribed into aretention domain composed of a 15 aminoacid long carboxy-terminus that is ensnaredby binding to heparin sulfate in theextracellular matrix on secretion,preventing this isoform from diffusingaway from the cell. The shorter isoform,however, is able to diffuse away and triggerother cells (38–43). Although there islimited understanding how the individualisoforms may play a role in the p53/hypoxiapathway, a number of groups have shownthat PDGFA expression associates withhypoxic lung injury in rodent models oflung disease (44, 45). We have also shownthat there is a significant enrichment ofgenes associated with the p53/hypoxiapathway in a list of genes whose expressioncorrelates significantly with the ratio ofmeasurements of the two different isoformsof PDGFA and is further evidence that thisoverlapping alternatively spliced gene inIPF and COPD might also be related to thep53/hypoxia pathway (see RESULTS in theonline supplement). Although this suggestsa relationship that may affect thep53/hypoxia pathway, such as NUMB, thespecifics of how the two isoforms may affectthe p53/hypoxia pathway are unknownand a potential area of future study.

Exploring the shared molecularnetwork between IPF and emphysemainvolved an integrative analysis of themRNA-seq and microRNA array datagenerated on the same lung tissue samples.This analysis also revealed evidence

suggesting that both diseases share commontranscriptional regulatory motifs, withseveral of the same microRNAs beingincluded in the regulatory networks of both.Specifically of interest is MIR96, which goesup in both diseases and could regulate anumber of genes differentially expressed inboth IPF and emphysema includingSCL1A1, SH3BP5, LDB2, and ARGHAP24.SCL1A1 is a glutamate transporter that isdown-regulated under hypoxic conditions(46, 47). SH3BP5 inhibits BTK (48), whichis a binding partner of hypoxia-inducedmitogenic factor (HIMF) (49). LDB2 bindsto LIM domain-binding proteins (50),which inhibit HIF1A (51). Importantly,our results demonstrate that overexpressionof MIR96 in both lung epithelial cellsand fibroblasts in vitro recapitulatescomponents of the shared emphysema-IPFgene-expression network, providingevidence that MIR96 may regulate a part ofthe shared disease gene-expressionnetwork.

Although our unique study design andcomprehensive transcriptional profilingprovided an unprecedented resolution of thelung transcriptome in health and disease,there are several critical limitations to ourwork. We profiled whole lung tissue andthus some of the differential gene expressionobserved between conditions is driven bydiffering proportions of lung cell types.To address this concern, we pursuedimmunohistochemistry to validate the celltype responsible for expression of a selectnumber of genes. Interestingly, we observedconvergence of molecular pathways betweenIPF and emphysema despite the cleardifference in cell types between theseconditions, an observation that maysuggest that indeed these are core pathwaysthat underlie the lung response toenvironmental injury.

Given the cross-sectional nature of ourstudy, it is difficult to distinguish geneexpression changes that are causal versusconsequence of the disease process. Theintegrative mRNA-miRNA network and thesubsequent functional validation studiesin vitro provide some evidence for a causalrelationship between regulatory miRNA

and the disease-associated gene expressionnetwork. However, further functionalstudies including animal knockout modelsare needed to conclusively test for causality.Some of the control lungs were collectedfrom smokers with adjacent lung cancer,raising the potential for some of thedifferential expression to be driven by thelocal “field cancerization” effect. To limitthis confounding effect, we used matchedsingle-nucleotide polymorphism array fromthe lung and blood to filter out sampleswith abnormal gene fusions and cytogeneticabnormalities. Although the size of ourinitial cohort was limited, we confirmed oursequencing results on multiple levels,including validating and replicating theresults using an additional cohort andadditional techniques. Finally, our studydid not explore the potential for RNA-seqto uncover unannotated alternative splicingevents, novel transcripts associated withdisease, or comparative network analysis,which were beyond the scope of thecurrent paper.

In summary, our paper demonstratesthe potential for next-generation sequencingto provide resolution of the transcriptionpotential of the lung in health and disease.Importantly, by profiling multiple distinctlung diseases in parallel within the samestudy, we uncovered molecular networksthat are shared among smokers with IPF andemphysema. Expanding this approach toother disease of the lung and other organsystems may enable us to begin to redefinethe clinical and pathologic boundaries thathave traditionally divided disease entities tothe molecular pathways that are shared ordistinct. Our study also demonstrates thevalue of integrating diverse molecular datafrom the same specimen to gain insight intothe regular networks that associate withdisease. These networks may not onlyprovide insight into disease pathogenesis(additional function studies are needed), butif proved to be causal may suggest noveldiagnostic biomarkers and therapeutictargets for chronic lung disease. n

Author disclosures are available with the textof this article at www.atsjournals.org.

References

1. Mannino DM, Homa DM, Akinbami LJ, Ford ES, Redd SC. Chronicobstructive pulmonary disease surveillance—United States,1971-2000. Respir Care 2002;47:1184–1199.

2. Selman M, King TE, Pardo A; American Thoracic Society; EuropeanRespiratory Society; American College of Chest Physicians. Idiopathicpulmonary fibrosis: prevailing and evolving hypotheses about itspathogenesis and implications for therapy. Ann Intern Med 2001;134:136–151.

ORIGINAL ARTICLE

958 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016

Page 12: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

3. Pauwels RA, Buist AS, Calverley PMA, Jenkins CR, Hurd SS; GOLDScientific Committee. Global strategy for the diagnosis, management,and prevention of chronic obstructive pulmonary disease.NHLBI/WHO Global Initiative for Chronic Obstructive Lung Disease(GOLD) Workshop summary. Am J Respir Crit Care Med 2001;163:1256–1276.

4. Wang I-M, Stepaniants S, Boie Y, Mortimer JR, Kennedy B, Elliott M,Hayashi S, Loy L, Coulter S, Cervino S, et al. Gene expression profilingin patients with chronic obstructive pulmonary disease and lungcancer. Am J Respir Crit Care Med 2008;177:402–411.

5. Spira A, Beane J, Pinto-Plata V, Kadar A, Liu G, Shah V, Celli B,Brody JS. Gene expression profiling of human lung tissue fromsmokers with severe emphysema. Am J Respir Cell Mol Biol 2004;31:601–610.

6. Ning W, Li C-J, Kaminski N, Feghali-Bostwick CA, Alber SM, Di YP,Otterbein SL, Song R, Hayashi S, Zhou Z, et al. Comprehensive geneexpression profiles reveal pathways related to the pathogenesis ofchronic obstructive pulmonary disease. Proc Natl Acad Sci USA 2004;101:14895–14900.

7. Golpon HA, Coldren CD, Zamora MR, Cosgrove GP, Moore MD, TuderRM, Geraci MW, Voelkel NF. Emphysema lung tissue gene expressionprofiling. Am J Respir Cell Mol Biol 2004;31:595–600.

8. Kass DJ, Kaminski N. Evolving genomic approaches to idiopathicpulmonary fibrosis: moving beyond genes. Clin Transl Sci 2011;4:372–379.

9. Pardo A, Gibson K, Cisneros J, Richards TJ, Yang Y, Becerril C, Yousem S,Herrera I, Ruiz V, Selman M, et al. Up-regulation and profibrotic role ofosteopontin in human idiopathic pulmonary fibrosis. PLoS Med 2005;2:e251.

10. Selman M, Pardo A, Barrera L, Estrada A, Watson SR, Wilson K, Aziz N,Kaminski N, Zlotnik A. Gene expression profiles distinguishidiopathic pulmonary fibrosis from hypersensitivity pneumonitis.Am J Respir Crit Care Med 2006;173:188–198.

11. Zuo F, Kaminski N, Eugui E, Allard J, Yakhini Z, Ben-Dor A, Lollini L,Morris D, Kim Y, DeLustro B, et al. Gene expression analysis revealsmatrilysin as a key regulator of pulmonary fibrosis in mice andhumans. Proc Natl Acad Sci USA 2002;99:6292–6297.

12. Ezzie ME, Crawford M, Cho J-H, Orellana R, Zhang S, Gelinas R,Batte K, Yu L, Nuovo G, Galas D, et al. Gene expression networksin COPD: microRNA and mRNA regulation. Thorax 2012;67:122–131.

13. Van Pottelberge GR, Mestdagh P, Bracke KR, Thas O, van DurmeYMTA, Joos GF, Vandesompele J, Brusselle GG. MicroRNAexpression in induced sputum of smokers and patients with chronicobstructive pulmonary disease. Am J Respir Crit Care Med 2011;183:898–906.

14. Milosevic J, Pandit K, Magister M, Rabinovich E, Ellwanger DC, Yu G,Vuga LJ, Weksler B, Benos PV, Gibson KF, et al. Profibrotic role ofmiR-154 in pulmonary fibrosis. Am J Respir Cell Mol Biol 2012;47:879–887.

15. Pandit KV, Corcoran D, Yousef H, Yarlagadda M, Tzouvelekis A,Gibson KF, Konishi K, Yousem SA, Singh M, Handley D, et al.Inhibition and role of let-7d in idiopathic pulmonary fibrosis.Am J Respir Crit Care Med 2010;182:220–229.

16. Chilosi M, Poletti V, Rossi A. The pathogenesis of COPD and IPF:distinct horns of the same devil? Respir Res 2012;13:3.

17. Yang IV, Pedersen BS, Rabinovich E, Hennessy CE, Davidson EJ,Murphy E, Guardela BJ, Tedrow JR, Zhang Y, Singh MK, et al.Relationship of DNA methylation and gene expression in idiopathicpulmonary fibrosis. Am J Respir Crit Care Med 2014;190:1263–1272.

18. Raghu G, Collard HR, Egan JJ, Martinez FJ, Behr J, Brown KK, ColbyTV, Cordier J-F, Flaherty KR, Lasky JA, et al.; ATS/ERS/JRS/ALATCommittee on Idiopathic Pulmonary Fibrosis. An officialATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis:evidence-based guidelines for diagnosis and management.Am J Respir Crit Care Med 2011;183:788–824.

19. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H,Salzberg SL, Rinn JL, Pachter L. Differential gene and transcriptexpression analysis of RNA-seq experiments with TopHat andCufflinks. Nat Protoc 2012;7:562–578.

20. Piccolo SR, Withers MR, Francis OE, Bild AH, Johnson WE.Multiplatform single-sample estimates of transcriptional activation.Proc Natl Acad Sci USA 2013;110:17778–17783.

21. Kass DJ, Yu G, Loh KS, Savir A, Borczuk A, Kahloon R, Juan-Guardela B,Deiuliis G, Tedrow J, Choi J, et al. Cytokine-like factor 1 geneexpression is enriched in idiopathic pulmonary fibrosis and drives theaccumulation of CD41 T cells in murine lungs: evidence for anantifibrotic role in bleomycin injury. Am J Pathol 2012;180:1963–1978.

22. Huang GT, Athanassiou C, Benos PV. mirConnX: condition-specificmRNA-microRNA network integrator. Nucleic Acids Res 2011;39:W416-23.

23. Hebenstreit D, Fang M, Gu M, Charoensawan V, van Oudenaarden A,Teichmann SA. RNA sequencing reveals two major classes ofgene expression levels in metazoan cells. Mol Syst Biol 2011;7:497.

24. Hayles B, Yellaboina S, Wang D. Comparing transcription rate andmRNA abundance as parameters for biochemical pathway andnetwork analysis. PLoS One 2010;5:e9908.

25. Kanehisa M. The KEGG database. Novartis Found Symp 2002;247:91–101; discussion 101–103, 119–128, 244–252.

26. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA,Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene setenrichment analysis: a knowledge-based approach for interpretinggenome-wide expression profiles. Proc Natl Acad Sci USA 2005;102:15545–15550.

27. Breitbart RE, Andreadis A, Nadal-Ginard B. Alternative splicing: aubiquitous mechanism for the generation of multiple protein isoformsfrom single genes. Annu Rev Biochem 1987;56:467–495.

28. Kulkarni MM. Digital multiplexed gene expression analysis using theNanoString nCounter System. Curr Protoc Mol Biol 2011;94:25B.10.1–25B.10.17.

29. Malkov VA, Serikawa KA, Balantac N, Watters J, Geiss G, Mashadi-Hossein A, Fare T. Multiplexed measurements of gene signatures indifferent analytes using the Nanostring nCounter Assay System.BMC Res Notes 2009;2:80.

30. Thannickal VJ, Horowitz JC. Evolving concepts of apoptosis inidiopathic pulmonary fibrosis. Proc Am Thorac Soc 2006;3:350–356.

31. Nakashima N, Kuwano K, Maeyama T, Hagimoto N, Yoshimi M,Hamada N, Yamada M, Nakanishi Y. The p53-Mdm2 association inepithelial cells in idiopathic pulmonary fibrosis and non-specificinterstitial pneumonia. J Clin Pathol 2005;58:583–589.

32. Morissette MC, Vachon-Beaudoin G, Parent J, Chakir J, Milot J.Increased p53 level, Bax/Bcl-x(L) ratio, and TRAIL receptorexpression in human emphysema. Am J Respir Crit Care Med 2008;178:240–247.

33. Colaluca IN, Tosoni D, Nuciforo P, Senic-Matuglia F, Galimberti V,Viale G, Pece S, Di Fiore PP. NUMB controls p53 tumour suppressoractivity. Nature 2008;451:76–80.

34. Carter S, Vousden KH. A role for Numb in p53 stabilization. GenomeBiol 2008;9:221.

35. Yogosawa S, Miyauchi Y, Honda R, Tanaka H, Yasuda H. MammalianNumb is a target protein of Mdm2, ubiquitin ligase. Biochem BiophysRes Commun 2003;302:869–872.

36. Bonthron DT, Morton CC, Orkin SH, Collins T. Platelet-derived growthfactor A chain: gene structure, chromosomal location, and basis foralternative mRNA splicing. Proc Natl Acad Sci USA 1988;85:1492–1496.

37. Rorsman F, Bywater M, Knott TJ, Scott J, Betsholtz C. Structuralcharacterization of the human platelet-derived growth factor A-chaincDNA and gene: alternative exon usage predicts two differentprecursor proteins. Mol Cell Biol 1988;8:571–577.

38. Raines EW, Ross R. Compartmentalization of PDGF on extracellularbinding sites dependent on exon-6-encoded sequences. J Cell Biol1992;116:533–543.

39. Pollock RA, Richardson WD. The alternative-splice isoforms ofthe PDGF A-chain differ in their ability to associate with theextracellular matrix and to bind heparin in vitro. Growth Factors1992;7:267–277.

40. Kelly JL, Sanchez A, Brown GS, Chesterman CN, Sleigh MJ.Accumulation of PDGF B and cell-binding forms of PDGF A in theextracellular matrix. J Cell Biol 1993;121:1153–1163.

ORIGINAL ARTICLE

Kusko, Brothers, Tedrow, et al.: Transcriptomic Networks Underlying COPD and IPF 959

Page 13: ORIGINAL ARTICLE - Yale School of Medicine...emphysema phenotype, and n=20 histologically normal tissue samples. Seventy-five of the remaining 87 samples were selected by pathologists

41. Somasundaram R, Schuppan D. Type I, II, III, IV, V, and VI collagensserve as extracellular ligands for the isoforms of platelet-derivedgrowth factor (AA, BB, and AB). J Biol Chem 1996;271:26884–26891.

42. Andersson M, Ostman A, Westermark B, Heldin CH. Characterizationof the retention motif in the C-terminal part of the long splice form ofplatelet-derived growth factor A-chain. J Biol Chem 1994;269:926–930.

43. AnderssonM, Ostman A, Backstrom G, Hellman U, George-Nascimento C,Westermark B, Heldin CH. Assignment of interchain disulfidebonds in platelet-derived growth factor (PDGF) and evidence foragonist activity of monomeric PDGF. J Biol Chem 1992;267:11260–11266.

44. Mizuno S, Bogaard HJ, Kraskauskas D, Alhussaini A, Gomez-Arroyo J,Voelkel NF, Ishizaki T. p53 Gene deficiency promotes hypoxia-induced pulmonary hypertension and vascular remodeling in mice.Am J Physiol Lung Cell Mol Physiol 2011;300:L753–L761.

45. Katayose D, Ohe M, Yamauchi K, Ogata M, Shirato K, Fujita H,Shibahara S, Takishima T. Increased expression of PDGF A- andB-chain genes in rat lungs with hypoxic pulmonary hypertension.Am J Physiol 1993;264:L100–L106.

46. Bianchi MG, Bardelli D, Chiu M, Bussolati O. Changes in the expressionof the glutamate transporter EAAT3/EAAC1 in health and disease.Cell Mol Life Sci 2014;71:2001–2015.

47. Fujita H, Sato K, Wen T-C, Peng Y, Sakanaka M. Differentialexpressions of glycine transporter 1 and three glutamate transportermRNA in the hippocampus of gerbils with transient forebrainischemia. J Cereb Blood Flow Metab 1999;19:604–615.

48. Ortutay C, Nore BF, Vihinen M, Smith CIE. Phylogeny of Tec familykinases: identification of a premetazoan origin of Btk, Bmx, Itk, Tec,Txk, and the Btk regulator SH3BP5. In: Hall JC, editor. Advances ingenetics. Amsterdam, the Netherlands: Academic Press; 2008. pp. 51–80.

49. Su Q, Zhou Y, Johns RA. Bruton’s tyrosine kinase (BTK) is a bindingpartner for hypoxia induced mitogenic factor (HIMF/FIZZ1) andmediates myeloid cell chemotaxis. FASEB J 2007;21:1376–1382.

50. Matthews JM, Visvader JE. LIM-domain-binding protein 1: amultifunctional cofactor that interacts with diverse proteins.EMBO Rep 2003;4:1132–1137.

51. Lin J, Qin X, Zhu Z, Mu J, Zhu L, Wu K, Jiao H, Xu X, Ye Q. FHL familymembers suppress vascular endothelial growth factor expressionthrough blockade of dimerization of HIF1a and HIF1b. IUBMB Life2012;64:921–930.

ORIGINAL ARTICLE

960 American Journal of Respiratory and Critical Care Medicine Volume 194 Number 8 | October 15 2016