home > issue 22 issue 22 : august, 2019

17
Published on Office of Cancer Genomics (https://ocg.cancer.gov ) Home > Issue 22 Issue 22 : August, 2019 CTD² GUEST EDITORIAL Dissecting Cellular Heterogeneity using Single-cell RNA Sequencing Single-cell RNA sequencing enables analysis of the transcriptome of individual cell, provides information on cell-states, and allows a high- resolution characterization of heterogeneous tumor microenvironment. It can be used to discover therapeutic targets and can enable mechanistic understanding of target inhibition. HCMI PROGRAM HIGHLIGHTS Scientific Applications of Next-generation Cancer Models HCMI is providing the scientific community with next-gen cancer models that more closely resemble primary tumors, and that are annotated with genomic and clinical data. The article provides examples of how next- gen models have been applied in research. CTD² GUEST EDITORIAL GenePattern Notebook: Integration of Electronic Notebooks with Bioinformatics Tools for Genomic Data Analysis The GenePattern Notebook is an electronic notebook that enables integrative genomic analyses. These analyses are displayed in a user- friendly form and allows scientists even without programming experience to share, collaborate, and publish the results. HCMI PROGRAM HIGHLIGHTS Collecting Uniform Clinical Data for a Community Resource HCMI’s clinical Case Report Forms (CRFs) standardize the clinical data

Upload: others

Post on 12-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Home > Issue 22 Issue 22 : August, 2019

Published on Office of Cancer Genomics (https://ocg.cancer.gov)Home > Issue 22

Issue 22 : August, 2019CTD² GUEST EDITORIAL Dissecting Cellular Heterogeneity using Single-cellRNA Sequencing

Single-cell RNA sequencing enables analysis of the transcriptome ofindividual cell, provides information on cell-states, and allows a high-resolution characterization of heterogeneous tumor microenvironment. Itcan be used to discover therapeutic targets and can enable mechanisticunderstanding of target inhibition.

HCMI PROGRAM HIGHLIGHTS Scientific Applications of Next-generation CancerModels

HCMI is providing the scientific community with next-gen cancer modelsthat more closely resemble primary tumors, and that are annotated withgenomic and clinical data. The article provides examples of how next-gen models have been applied in research.

CTD² GUEST EDITORIAL GenePattern Notebook: Integration of ElectronicNotebooks with Bioinformatics Tools for Genomic DataAnalysis

The GenePattern Notebook is an electronic notebook that enablesintegrative genomic analyses. These analyses are displayed in a user-friendly form and allows scientists even without programmingexperience to share, collaborate, and publish the results.

HCMI PROGRAM HIGHLIGHTS Collecting Uniform Clinical Data for a CommunityResource

HCMI’s clinical Case Report Forms (CRFs) standardize the clinical data

Page 2: Home > Issue 22 Issue 22 : August, 2019

that are collected from participating Tissue Source Sites (TSSs)collaborating with HCMI. The article discusses the process of developingcancer-type specific CRFs to ensure uniformity and compatibility for useat TSSs across the globe.

OCG PERSPECTIVE Leveraging a Genomics Background to FacilitateMolecular Characterization of HCMI Models

This perspective article introduces Dr. Lauren Hurd, a new ScientificProgram Manager for the Human Cancer Models Initiative. Dr. Hurddiscusses her background in genomics and its applications in her currentrole.

CTD² GUEST EDITORIALDissecting Cellular Heterogeneity using Single-cell RNASequencingAnuja Sathe, M.B.B.S., Ph.D. and Hanlee P. Ji, M.D.Stanford University

Gene expression analysis using RNA sequencing hascontributed immensely to our knowledge of cancer. However,bulk sequencing methods average signals by poolinginformation from a mass of cells. These methods are thusunable to fully resolve the complexity of intra-tumorheterogeneity in cancer1. Single-cell RNA sequencing (scRNA-

seq) enables the analysis of the transcriptome of each individual cell and allows ahigh-resolution characterization of tumors. Moreover, tumors are not isolated massesof cancer cells but are surrounded by a unique microenvironment composed ofdifferent cell types. scRNA-seq enables the analysis of this heterogenous tumormicroenvironment (TME) that is increasingly important in improving treatmentstrategies such as immunotherapy. Unlike other single-cell analysis methods such asmass cytometry or CyTOF, scRNA-seq allows an unbiased assessment of cellularphenotypes. This enables the identification of not just heterogenous cell types butprovides information on individual cell states.

In recent years, scRNA-seq has been used to construct single cell atlases of severaltumor types2. These studies have revealed novel targets in sub-populations of cancercells as well as in the TME. We have successfully used scRNA-seq in thecharacterization of lymphoma TME using patient biopsies and in an organoid mousemodel of gastric cancer3,4. We are applying it to understand the TME ofgastrointestinal cancers from fresh surgical specimens as well as in cell line modelsto delineate clonal heterogeneity5.

Page 3: Home > Issue 22 Issue 22 : August, 2019

Several technologies have been developed for scRNA-seq6. They differ in theirmethod of cell isolation (e.g. plate or microfluidics-based), capture and length oftranscript (full-length, 3’ or 5’ end) as well as the chemistry used for reversetranscription and amplification. Choosing a particular platform for an experimentdepends on the question being investigated. For example, microfluidics-basedtechniques are more high-throughput than plate-based ones. Methods that capturethe 3’ or 5’ transcript do not allow the detection of splicing events, isoforms, orquantifying allelic expression.

Following sequencing, a typical analytical workflow begins with data matricescontaining entries of molecular counts corresponding to each gene and cell, whichare represented in respective rows or columns. This high-dimensional data isgenerally analyzed using dimensionality reduction (e.g. with principal componentanalysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE)) followed byclustering of cells with similar transcriptional profiles (Figure). Understandingdifferences across clusters is aided by differential expression testing.

Figure: Schematic representation of microfluidics based scRNA-seq workflow: Tissuespecimens are dissociated into a single-cell suspension followed by a microfluidicsbased scRNA-seq protocol. cDNA from each individual cell is tagged with a unique

barcode and made into a sequencing library. The resulting data is a matrix ofmolecular counts per transcript per cell. High dimensional data is processed usingdimensionality reduction and clustering approaches. (Image credit: Created with

BioRender)

A major challenge in scRNA-seq analysis results from the noise and variability of theassay owing to a high number of zero transcript counts. These dropouts can bebiological, related to the stochasticity of mRNA expression, or technical, owing to thelow amount of transcript material and limited efficiency of its capture. This can resultin distorted false negative or false positive profiles. scRNA-seq requires carefulattention to quality control and use of appropriate computational methods that aresuited for such data distribution7. Another limitation of scRNA-seq is that it does notretain any spatial information. Moreover, disaggregation methods are required toproduce a single-cell suspension that could introduce artifacts in the gene expressionprogram. Commercial assays and instrumentation can also be expensive.

A number of technical and computational approaches are enabling improvements inscRNA-seq that overcome many of these challenges. For example, conducting

Page 4: Home > Issue 22 Issue 22 : August, 2019

parallel experiments such as RNA in situ hybridization, imaging mass cytometry, orspatial transcriptomics can enable integration of spatial information. As sequencingcosts reduce, the current cost for an scRNA-seq experiment with a microfluidics-based platform works out to be less than 1 USD/cell (https://satijalab.org/costpercell[2]). This is additionally aided by modifications to the assay using antibodies to tageach cell that enable sample multiplexing8. Several quality control and imputationmethodologies are also being developed to improve data analytics9,10. Novel assaydevelopments such as single-cell DNA sequencing, single-cell Assay for TransposaseAccessible Chromatin (ATAC) sequencing, single-cell T Cell Receptor (TCR)sequencing, and single-cell epitope sequencing can also be integrated with scRNA-seq to reveal a wealth of information at high granularity.

scRNA-seq is thus equipped to answer several important questions in cancer biologyincluding target discovery and can enable a mechanistic understanding of targetinhibition. It also has tremendous potential in translational applications such aslongitudinal patient monitoring of treatment response11.

References

1. Andor N, Graham TA, Jansen M, et al. Pan-cancer analysis of the extent andconsequences of intratumor heterogeneity. Nat Med. 2016 Jan;22(1):105-13.(PMID: 26618723 [3])

2. Valdes-Mora F, Handler K, Law AMK, et al. Single-Cell Transcriptomics in CancerImmunobiology: The Future of Precision Oncology. Front Immunol. 2018 Nov12;9:2582. (PMID: 30483257 [4])

3. Chen J, Lau BT, Andor N, et al. Single-cell transcriptome analysis identifiesdistinct cell types and niche signaling in a primary gastric organoid model. SciRep. 2019 Mar 14;9(1):4536. (PMID: 30872643 [5])

4. Andor N, Simonds EF, Czerwinski DK, et al. Single-cell RNA-Seq of follicularlymphoma reveals malignant B-cell types and coexpression of T-cell immunecheckpoints. Blood. 2019 Mar 7;133(10):1119-1129. (PMID: 30591526 [6])

5. Andor N, Lau BT, Catalanotti C, et al. Joint single cell DNA-Seq and RNA-Seq ofgastric cancer reveals subclonal signatures of genomic instability and geneexpression. bioRxiv. 2018 Oct 17. doi: 10.1101/445932

6. Haque A, Engel J, Teichmann SA, et al. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med.2017 Aug 18;9(1):75. (PMID: 28821273 [7])

7. Chen G, Ning B, Shi T. Single-Cell RNA-Seq Technologies and RelatedComputational Data Analysis. Front Genet. 2019 Apr 5;10:317. (PMID: 31024627[8])

8. Stoeckius M, Zheng S, Houck-Loomis B, et al. Cell Hashing with barcodedantibodies enables multiplexing and doublet detection for single cell genomics.Genome Biol. 2018 Dec 19;19(1):224. (PMID: 30567574 [9])

9. Huang M, Wang J, Torre E, et al. SAVER: gene expression recovery for single-cellRNA sequencing. Nat Methods. 2018 Jul;15(7):539-542. (PMID: 29941873 [10])

10. Ilicic T, Kim JK, Kolodziejczyk AA, et al. Classification of low quality cells fromsingle-cell RNA-seq data. Genome Biol. 2016 Feb 17;17:29. (PMID: 26887813 [11])

11. Shalek AK, Benson M. Single-cell analyses to tailor treatments. Sci Transl Med.2017 Sep 20;9(408). (PMID: 28931656 [12])

Page 5: Home > Issue 22 Issue 22 : August, 2019

Back to Top

HCMI PROGRAM HIGHLIGHTSScientific Applications of Next-generation CancerModelsCindy Kyi, Ph.D.Office of Cancer Genomics, NCI

As a contribution to efforts in precision oncology [13], the NationalCancer Institute’s (NCI) Human Cancer Models Initiative [14]

(HCMI) is developing next-generation (next-gen) cancer models,which include 3D organoids, neurospheres, 2D adherent, andconditionally reprogrammed cells. These next-gen models arederived from a variety of cancer types including poor outcome

cancers, rare cancers, and cancers from ethnic and racial minorities, as well aspediatric populations. HCMI was founded by the NCI, Cancer Research UK, HubrechtOrganoid Technology, and Wellcome Sanger Institute to develop about 1,000 next-gen cancer models. The next-gen cancer models are annotated with molecularcharacterization data, as well as clinical data to address challenges of traditionalcancer cell lines. The goal of HCMI is to provide the research community with a richresource of diverse, fully annotated next-gen models (NGCMs) to better study diseasebiology.

Historically, traditional cancer cell lines have provided a platform to conduct largescale studies, such as investigating molecular regulations of cancer cell growth andprogression, identifying genetic and biological markers, and predicting drugsensitivities. The Cancer Cell Line Encyclopedia [15], a repository of cancer cell lineswith associated molecular data and analyses, is a valuable resource of over 1,100cell lines generated from numerous cancer types. Another resource is the CancerTherapeutics Response Portal (CTRP) [16] which houses a large dataset of quantitativesmall-molecule sensitivity data of cancer cell lines. This resource could be used tomine for lineages or mutations [17], enriched among cell lines, that are sensitive tosmall-molecules and identify new therapeutic vulnerabilities. The next-gen cancermodels aim to address limitations of most available cell lines such as poor orunknown representation of the cellular architecture of the original tumor,heterogeneity of cell types and genetic drivers of cancer subtypes1.

Next-generation models

The successful culture and expansion of organoids from murine small intestinal tissuepaved the way for early organoid models generated from mouse colon and humansmall intestine and colonic epithelium2. Human intestinal organoids were observed tomimic in vivo [18] cellular differentiation; however, adaptations of the culture mediawere needed to successfully grow organoids from different tissue types. Sato andcolleagues reported optimized cell culture methods utilizing growth factors, Notchprotein [19] inhibitors, nicotinamide [20], and kinase enzyme inhibitors for culturingprimary human epithelial [21] cells from small intestine, colon, adenoma [22],

Page 6: Home > Issue 22 Issue 22 : August, 2019

adenocarcinomas [23], and Barrett’s esophagus [24]. The models were hypothesized tobe more representative of the tumor biology than colon cancer cell lines2. The grouprecently published their protocols for generating next-gen cancer models (NGCMs)from breast normal and tumor tissues3.

Identifying an optimized culture medium for each cancer and tissue type is critical forsuccessfully growing next-gen models which retain their originating tumorcharacteristics. According to Ince and colleagues, ovarian tumor cell lines grown instandard culture media: “(1) had very low success rate (less than one percent) [sic ofbeing established in culture], (2) had long lag times for the first passage, (3) couldonly be propagated for up to 15 passages and (4) lacked the phenotype of originaltumor”4. The authors developed 25 diverse ovarian cell lines using optimized culturemedia compositions for each human ovarian cancer subtype. The resulting ovariancancer cell lines retained the genomic landscape, histopathology, and molecularcharacteristics of the original tumors from which they were derived. The expressionprofiles and drug responses of these cell lines were also found to correlate withpatient outcomes4.

Applications of next-generation models

Next-gen models have been shown to be excellent research tools to carry out ex vivo[25] experiments as they recapitulate the biology and tissue architecture of primarytumors. A few examples of applications of next-gen models in research includestudying disease progression, identifying genomic and molecular drivers of diseases,and screening compounds or small molecules for treatment sensitivity and/orresistance.

Studying disease progression in pancreatic cancers can be challenging due tolack of patient-derived models that cover the full spectrum of diseaseprogression, lack of clinical correlations, and accumulation of geneticaberrations. Boj and colleagues were able to generate human derived organoidsfrom normal and neoplastic ductal cells using modified culture conditions5.Through targeted sequencing of cancer-associated genes on organoids derivedfrom human normal and tumor tissues, the authors identified oncogenic KRASmutations in majority of the tumor-derived models indicating the organoidsrepresented the cancer driver mutations observed in the originating humantumors.Patient-derived organoid models of pancreatic ductal adenocarcinoma (PDAC)were used to test chemosensitivity [26] and chemoresistance of individualtumors. One of the limitations of traditional cell lines is that due to genetic driftafter multiple passages, there are differences in genetic profiles between theoriginal tumor and the derived cell lines; such as copy number alterations, DNAmethylation [27], molecular subtypes, and resulting phenotypes. Tiriac andcolleagues found that the patient-derived PDAC organoids harbored geneticalterations that are consistent with known pathogenic mutations in PDAC6. Theauthors concluded that the organoids recapitulated the mutational spectrumand molecular subtypes of primary pancreatic cancer and, therefore, areexcellent models to accurately examine and predict responses tochemotherapeutic agents6.

Page 7: Home > Issue 22 Issue 22 : August, 2019

The lack of model systems that reflect the pathology of the primary disease andresponses to therapy presents a challenge in studying esophagealadenocarcinoma [28] (EAC)1. Using patient tumor-derived organoid models, Li andcolleagues could identify tumor drivers of EAC through histological and genomiccharacterization1. The molecular annotation of the EAC organoids showed thatthey retained patient-specific gene expression, disrupted cellular polarity, intra-tumor heterogeneity, and drug sensitivity1. Based on these findings, the use ofpatient-derived organoids provides model systems to accurately study disease.The variability in drug response due to cellular heterogeneity presents anotherchallenge in cancer research using traditional cell lines. Next-gen cancer modelswere shown to produce reliable responses that resemble those of the originatingtumor when screening targeted therapy compounds7.Similar to the responsesfound in mouse organoids, treatment of patient-derived organoids withItraconazole, a cell cycle inhibitor compound8, led to inhibited organoid growthand cell death7. Buczacki and colleagues suggested that they found atherapeutic potential of Itraconazole in inducing cancer cell death andpreventing late recurrence in colorectal cancer7. Hence, patient-derivedorganoids could be used to identify novel drug targets.The lack of cellular architecture to mirror the tumor microenvironment andstromal response presents difficulty in studying immune interactions in cancer.It is challenging to elicit tumor-immune specific responses using traditional celllines without a tumor microenvironment. Studying tumors using patient-derivedxenografts (PDXs) in immunocompromised mice also does not reflect theimmune interactions of human tumors due to lack of immune response9. Toovercome these challenges, Neal and colleagues used patient-derived 3Dmodels from various tumor types using the air-liquid interface (ALI) method. Themodels included a mixture of cancer cells and several immune cell types, thelatter expressing immune check-point surface receptor programmed cell deathprotein-1(PD-1) [29]. These models enabled the study of immune interactions withcancer cells by providing in vitro tumor microenvironment10. To test for anti-PD-1-dependent tumor cell killing, the ALI models were treated with PD-1 blockingantibody [30]. The tumor-infiltrating lymphocytes in the patient-derived modelswere found to model the immune checkpoint blockade, resulting in cancer celldeath10. Hence, the use of NGCMs can aid in mirroring the tumormicroenvironment in vitro and provide models for immune-oncology research.

The results summarized above, are just a few examples of important outcomes usingthe NGCMs. The models have a great potential in research to improve ourunderstanding of cancer etiology and the improvement of treatment outcomes.

HCMI next-gen models

The advantages of using HCMI’s NGCMs over traditional cell lines include theavailability of clinical data such as patient and tumor information, histopathologicalbiomarkers, and molecular characterization data. The comprehensive data increaseaccuracy in identifying driver mutations and targeted therapies in various cancersubtypes. If interested in using the next-gen models developed by HCMI, visit theHCMI Searchable Catalog [31] to query and browse available cancer models based on a

Page 8: Home > Issue 22 Issue 22 : August, 2019

subset of clinical and molecular data. Currently, the catalog includes 35 NGCMs thatare from the brain, bone, bronchus and lung, colon, pancreas, stomach, and rectum.Standardized protocols and optimized cell culture media formulations for modelgrowth and expansion for each model are available through the third-partydistributor, American Type Culture Collection [32]. These HCMI resources shouldfacilitate the repeatability and reproducibility in growing human cancer modelsspecific to each cancer type. The model-associated genomic and clinical data arequality-controlled and harmonized at multiple checkpoints for reliability of associateddata, and are available publicly at NCI’s Genomic Data Commons [33]. HCMI next-genmodels with associated data provide the research community with a valuableresource to accelerate the translation of research findings to precision oncology.

References

1. Li X, Francies HE, Secrier M, et al. Organoid cultures recapitulate esophagealadenocarcinoma heterogeneity providing a model for clonality studies andprecision therapeutics. Nature Communications. 2018 Jul 30; 9(1):2983. (PMID:30061675 [34])

2. Sato T, Stange DE, Ferrante M, et al. Long-term expansion of epithelial organoidsfrom human colon, adenoma, adenocarcinoma, and Barrett's epithelium.Gastroenterology. 2011 Nov;141(5):1762-72. (PMID: 21889923 [35])

3. Sachs N, de Ligt J, Kopper O, et al. A Living Biobank of Breast Cancer OrganoidsCaptures Disease Heterogeneity. Cell. 2018 Jan 11;172(1-2):373-386. (PMID:29224780 [36])

4. Ince TA, Sousa AD, Jones MA, et al. Characterization of twenty-five ovariantumour cell lines that phenocopy primary tumours. Nature Communications.2015 Jun 17; 6:7419 (PMID: 26080861 [37])

5. Boj SF, Hwang CI, Baker LA, et al. Organoid models of human and mouse ductalpancreatic cancer. Cell. 2015 Jan 15;160(1-2):324-38. (PMID: 25557080 [38])

6. Tiriac H, Belleau P, Engle DD, et al. Organoid Profiling Identifies CommonResponders to Chemotherapy in Pancreatic Cancer. Cancer Discovery. 2018Sep;8(9):1112-1129. (PMID: 29853643 [39])

7. Buczacki SJA, Popova S, Biggs E, et al. Itraconazole targets cell cycleheterogeneity in colorectal cancer. The Journal of Experimental Medicine. 2018Jul 2;215(7):1891-1912. (PMID: 29853607 [40])

8. Pantziarka P, Sukhatme V, Bouche G, Meheus L, Sukhatme VP. RepurposingDrugs in Oncology (ReDO)-itraconazole as an anti-cancer agent. E cancermedical science. 2015 Apr 15; 9:521. (PMID: 25932045 [41])

9. Baker LA, Tiriac H, Clevers H, Tuveson DA. Modeling pancreatic cancer withorganoids. Trends in Cancer. 2016 Apr;2(4):176-190. (PMID: 27135056 [42])

10. Neal JT, Li X, Zhu J, et al. Organoid Modeling of the Tumor ImmuneMicroenvironment. Cell. 2018 Dec 13;175(7):1972-1988. (PMID: 30550791 [43])

Back to Top

CTD² GUEST EDITORIALGenePattern Notebook: Integration of ElectronicNotebooks with Bioinformatics Tools for Genomic Data

Page 9: Home > Issue 22 Issue 22 : August, 2019

AnalysisMichael ReichUniversity of California, San Diego

Over the past several years, the electronic analysis notebook hasemerged as an effective and versatile tool for the authoring,publishing, and sharing of scientific research. It allows scientists tocombine the scientific exposition – text, images, and even multimedia– with the actual code that runs the analysis, creating a single“research narrative” document that is reproducible, containing all ofthe computational steps in an analysis; adaptable by other scientists

to their own research; comprehensive, conveying research in a high level of detail,without the limitations of publications or paper media; and accessible, often requiringonly a web browser to view and run.

The Jupyter Notebook system1 has become a de facto standard notebookenvironment in data science and genomic analysis. The community of Jupyter usersextends well beyond these, reaching areas of science as diverse as physics,economics, and linguistics. However, the Jupyter notebook format assumes familiaritywith a programming language in order to access analyses, and even text must beformatted using a programming-style language.

To extend the capabilities of notebooks to the needs of researchers at all levels ofprogramming expertise, we developed the GenePattern Notebook environment2 withfunding from the National Cancer Institute’s Informatics Technologies in CancerResearch (ITCR) program. GenePattern Notebook (http://genepattern-notebook.org/[44]) integrates Jupyter’s research narrative capabilities with the hundreds of genomicanalysis and visualization tools available through the GenePattern3 platform. TheGenePattern Notebook workspace is available for public use and requires registrationto gain access. This tool allows scientists to develop, share, collaborate on, andpublish their notebooks, requiring only a web browser. In this environment,investigators can design their in-silico experiments, perform and refine analyses,launch compute-intensive analyses on cloud-based and high-performance computeresources, and publish their results as electronic notebooks that other scientists canadopt to reproduce the original analyses and modify for their own work.

The GenePattern Notebook environment provides capabilities beyond standardnotebook platforms:

Access to a wide range of genomic analyses within a notebook . GenePatternprovides hundreds of analyses, from machine learning techniques such as clustering,classification, and dimension reduction, to omic-specific methods for gene expressionanalysis, proteomics, flow cytometry, sequence variation analysis, pathway analysis,and others. The analyses are launched from a user-friendly form in the notebook andrun on a remote GenePattern server, which may be on a cloud provider or hosted ata high-performance compute site. This allows compute-intensive analyses to run inan environment where they are most suited. Results are available within thenotebook and may easily be used in other analysis steps.

Page 10: Home > Issue 22 Issue 22 : August, 2019

Featured notebooks. A library of featured genomic analysis notebooks is availableon the GenePattern Notebook workspace (Figure 1). These notebooks includetemplates for common analysis tasks (e.g. hierarchical clustering of RNA-seq data,gene set enrichment analysis, non-negative matrix factorization (NMF)), as well asdisease-specific research scenarios and compute-intensive methods. Featurednotebooks also include those that were developed in collaboration with research labsas a means to disseminate their analysis methods, including the Coordinated GeneActivity in Pattern Sets (CoGAPS) Bayesian NMF method for inference of biologicalprocess activity4 and the AMARETTO multi-omics tool for inference of regulatorynetworks in cancer and other diseases5. A cancer-focused notebook, “GenomicDiscovery to Translation”, is aimed at providing insights into candidate drugs forpatient therapy. This method combines RNA-Seq profiling data with expression andviability data from cell lines to identify compounds as candidate therapeutics. It usespublicly available data resources, including the Cancer Cell Line Encyclopedia [15]

(CCLE), Sanger Cell Line Project [45] (SCLP), Cancer Therapeutics Response Portal [16]

(CTRP), and Genomics of Drug Sensitivity in Cancer [46] (GDSC).

[47]

Figure 1: GenePattern Notebook Workspace, showing library of featured analysisnotebooks.

Scientists can easily copy these notebooks, use them as is, or adapt them for theirresearch purposes. Users with computational experience can modify their ownversions of the notebook with variations, for example to try alternative analysismethods, additional data resources, or other ‘omics’ data types. An examplenotebook for the analysis of copy number variation in methylation array data isshown in Figure 2. The GenePattern analysis shown there replaces a considerableamount of code and facilitates analysis for non-programming scientists. Researcherscan upload and store up to 30 GB of data and GenePattern development team canincrease the size if additional space is required for the analysis.

Page 11: Home > Issue 22 Issue 22 : August, 2019

[48]

Figure 2: GenePattern Notebook for performing copy number variation analysis onIllumina 450k/EPIC methylation array data.

Notebook enhancements. GenePattern Notebooks have several features thatenhance the original standard Jupyter notebook interface. First, a rich text editorallows scientists to enter and format text without knowing a text formatting languagesuch as Markdown or LaTeX. Second, users can create a table of contents from theheadings in a notebook, which updates automatically as headings are added orchanged. It can either be embedded in the notebook or float alongside, allowing easynavigation to any point in a notebook. Third, a user interface-building tool (Figure 3)allows notebook developers to wrap their code so that it is displayed as a web form,with only the necessary inputs exposed. Users of the notebook are presented with asimplified display that allows them to run the analyses without needing to interactwith the code behind them.

Figure 3: User Interface (UI) Builder: (a) A Python cell containing code to execute ananalysis. (b) The UI Builder display hides the Python code, displaying only the

required inputs.

Publication and collaborative editing. Notebook developers often wish to share

Page 12: Home > Issue 22 Issue 22 : August, 2019

their notebooks, either with the research community or among collaborators. Tomake a notebook publicly accessible, the author selects the “publish” feature andadds descriptive information and tags to make the notebook easy to find in a searchquery. The notebook is then made available on the “community” section of theworkspace. An author can include a web link to a public notebook in a publication,and users who follow the link will see a read-only version of the notebook, with theoption to log in to the workspace, where they can run, copy, and edit their ownversion. For collaborative editing, an author can send a sharing invitation tocolleagues, who then can also view, run, and edit the notebook prior to itspublication.

The GenePattern Notebook environment is freely available at http://genepattern-notebook.org/ [44]. Researchers can make their tools available for public use throughthe GenePattern server or GenePattern Archive [49] (GParc), a community repository. Arelated GenePattern Notebook resource, the Human Cell Atlas Notebook Workspace,https://hca.genepattern.org/ [50], is dedicated to the Human Cell Atlas6 and features agrowing collection of notebooks providing single-cell analysis tools. For more detailsabout the GenePattern Notebook, view the video tutorial at https://genepattern-notebook.org/tutorials/ [51]. If you have any further questions, please [email protected] [52].

References

1. Kluyver T, Ragan-Kelley B, Pérez F, et al. Jupyter Notebooks-a publishing formatfor reproducible computational workflows. In ELPUB. 2016 May 26; pp. 87-90.

2. Reich M, Tabor T, Liefeld T, et al. The GenePattern Notebook Environment. CellSystems. 2017 Aug 23;5(2):149-151.e1. (PMID: 28822753 [53])

3. Reich M, Liefeld T, Gould J, et al. GenePattern 2.0. Nature Genetics. 2006May;38(5):500-1. (PMID: 16642009 [54])

4. Fertig EJ, Ding J, Favorov AV, et al. CoGAPS: an R/C++ package to identifypatterns and biological process activity in transcriptomic data. Bioinformatics.2010 Nov 1;26(21):2792-2793. (PMID: 20810601 [55])

5. Champion M, Brennan K, Croonenborghs T, et al. Module Analysis CapturesPancancer Genetically and Epigenetically Deregulated Cancer Driver Genes forSmoking and Antiviral Response. EBioMedicine. 2018 Jan;27:156-166. (PMID:29331675 [56])

6. Regev A, Teichmann SA, Lander ES, et al. The Human Cell Atlas. Elife. 2017 Dec5;6. pii: e27041. (PMID: 29206104 [57])

Back to Top

HCMI PROGRAM HIGHLIGHTSCollecting Uniform Clinical Data for a CommunityResourceEva Tonsing-Carter, Ph.D.Office of Cancer Genomics, NCI

Page 13: Home > Issue 22 Issue 22 : August, 2019

The use of cancer cell lines to model diverse cancer types haveseveral challenges. Most of the commonly used cancer cell linesin research do not have clinical or therapeutic outcome data ofthe participant from which the cell line was derived. The genomicrelatedness of the cell line to the parent tumor is unknown, andmolecular characterization including assessment of genomes andtranscriptomes of these cell lines, until recently, were mostly

unavailable. Diverse racial and ethnic groups, as well as rare cancers, are seldomrepresented in currently available cell lines. Additionally, cancer cell lines often donot represent certain molecular subtypes. To address these challenges, the HumanCancer Models Initiative [14] (HCMI) was formed. HCMI is an international consortiumwhich focuses on generating novel, next-generation tumor-derived cancer modelsannotated with clinical, genomic, and molecular data as a community resource. TheHCMI consortium founders include the National Cancer Institute (NCI), WellcomeSanger Institute, Cancer Research UK, and Hubrecht Organoid Technology. The NCIarm of the consortium supports HCMI next-generation model development at fourCancer Model Development Centers (CMDCs).

To advance cancer research and further understand the relationship between in vitrofindings and clinical biology, HCMI next-generation cancer models are associated withclinical data as well as molecular characterization data. To collect the associatedclinical data, a cancer type-specific Case Report Form (CRF) is developed for eachcancer type for which HCMI models are generated. As of June 2019, eighteendifferent enrollment and follow-up CRFs for glioblastoma, breast, colorectal,pancreas, pediatric cancers, and others have been developed.

The CRFs function to standardize the clinical data that are collected fromparticipating Tissue Source Sites (TSSs) and are composed of standardized NCIcommon data elements (CDEs) with a controlled vocabulary or permissible values(PVs). Clinical Data Working Groups (CDWG) consist of cancer type-specific clinicalexperts including pathologists, oncologists, and surgeons from the United States,United Kingdom, and the Netherlands. They contribute to the content of the CRFsused to collect clinical data. A preliminary list of CDEs is generated according toguidelines from the World Health Organization, International Classification ofDiseases for Oncology, and American Joint Committee on Cancer. The CDEs are usedas a base on which the working groups build a final CRF. The clinical data include theparticipant’s demographics, prognostic factors, and specifics of the tumor includinghistological subtype, pathological staging, and grade. Each cancer type-specific CRFincludes the current neoadjuvant and adjuvant therapy information andprognostic/predictive/lifestyle feature information based on the feedback from theseexperts.

A few patients donated tumor tissues from more than one anatomic site for modelgeneration. Examples include a.) a primary tumor and a metastatic lesion, b.) aprimary tumor and recurrence, c.) multiple metastases, or d.) a pre-malignant lesionand a primary tumor. With successful generation of multiple models per donor, theCRFs required updating to keep all the associated data in a single form. These CRFsutilize new CDEs to collect clinical data for distinct tissue types including “primary”,“metastatic/recurrent”, or “other” tissue as well as information linking a specificmodel to its corresponding originating tissue sample. Multiple model CRF design is

Page 14: Home > Issue 22 Issue 22 : August, 2019

available for several cancer types including melanoma, brain, hepatocellular, andrare cancers. It is not known if all cancer types will have donors with multiple models,however, the multiple model CRFs are available on the HCMI Resources [58] webpage.

Because HCMI is an international consortium, consideration for differences in clinicaldata collection are also addressed during the CDWG meetings. Differences interminology are discussed and incorporated into the CDEs. The Office of CancerGenomics [59] (OCG) works closely with the NCI’s Cancer Data Standards Registry andRepository [60] (caDSR) to ensure that the clinical data elements and metadata usedin the HCMI CRFs adhere to a uniform vocabulary and meaning. This datastandardization will enable HCMI clinical data to be compatible not only across theglobal HCMI TSS locations but also across different groups and programs. The use ofPVs allows for mapping of the clinical data to NCI’s Genomic Data Common [61]’s(GDC) data dictionary, enabling users to search and filter the data.

Once the CDEs have been finalized, and they are registered, updated, or modifiedwith NCI’s caDSR, the CDEs are submitted to the Clinical Data Center (CDC) atInformation Management Systems. The CDEs are compiled into an interactive web-based electronic CRF (eCRF) where TSSs may submit the HCMI clinical data. Theclinical data submitted by the TSSs are quality checked (QC’d) to ensure that nopersonal health information (PHI) is inadvertently submitted, and that the informationsubmitted conforms to the PVs and type of information expected (e.g. numeric valuesfor time interval questions). Once the clinical data are QC’d and any errors areaddressed, the clinical data are approved and mapped to the GDC dictionary beforesubmission. The finalized clinical data are submitted and stored at the GDC for accessby the scientific community.

PDF versions of the CRFs are also generated and accessible on the HCMI Resourcespage: HCMI Case Report Forms [58]. These CRFs can be utilized by anyone interestedin collecting clinical data associated with cancer tissues for other projects. Check theHCMI Resources page for updates to the CRFs as new cancer types are added. TheHCMI models and associated high-quality clinical data coupled with molecularcharacterization data provide the scientific community with a valuable resource forcancer research.

Back to Top

OCG PERSPECTIVELeveraging a Genomics Background to FacilitateMolecular Characterization of HCMI ModelsLauren Hurd, Ph.D.Office of Cancer Genomics, NCI

Page 15: Home > Issue 22 Issue 22 : August, 2019

My name is Lauren Hurd and I am a Scientific Program Manager forthe Human Cancer Models Initiative (HCMI) within the Office ofCancer Genomics (OCG). National Cancer Institute (NCI), togetherwith other consortium members [62], co-founded HCMI to provide aresource of ~1,000 clinically and molecularly characterized next-generation patient-derived cancer models to the researchcommunity. The cancer models are generated from parent tumors

that span a range of subtypes originating from individuals of diverse ethnic and racialbackgrounds as well as rare adult and pediatric tumors. The goal of this initiative is toprovide a resource of diverse, fully annotated models which more accuratelyrecapitulate the biology of their parent tumors. As a scientific program manager, Iwork with Daniela Gerhard, the Director of OCG to ensure that this goal can be metfor the United States' (U.S.) contribution to the HCMI.

For the last eight years, I have been working within the field of genomics in dynamicand challenging roles. My interest in genomics was cultivated in graduate school,where I focused on understanding the relationship between pathogenic variants in anonselective cation ion channel and an autosomal dominant [63] form of skeletaldysplasia, a disorder that affects the growth of bone and cartilage. I continued mycareer in genomics during my postdoctoral fellowship where I led the technicaldirection of molecular diagnostic testing for rare pediatric disorders in a clinicalsequencing lab. It was in this role that I gained extensive knowledge of both genomictesting and genomic data. I developed Sanger and NGS panel testing from theground up, interpreted variants identified during testing and reported on clinicallyactionable results. Meaningful interpretation or curation of the variants identifiedthrough the various sequencing technologies quickly became one of my favoriteparts of the job. Prior to joining NCI, I managed a team of variant curation scientistswho were responsible for interpreting variants identified during routine carrierscreening of autosomal recessive [64] and X-Linked pediatric disorders. I alsocollaborated with a multi-disciplinary team from bioinformatics, marketing andclinical operations to develop new versions of the carrier screening panel whichrequired critical management of many deliverables on strict timelines.

HCMI models within the U.S. are molecularly characterized with genomic andtranscriptomic data for the model as well as the associated normal, and parent-tumor. Annotating the models with this data, to the best of our ability, is a complexprocess. I leverage my strong background in molecular biology and genetics as wellas my experience in project management to facilitate this process (Figure). I workclosely with the Biospecimen Processing Center (BPC) to ensure that quality nucleicacid samples from model, normal and parent-tumor can be provided for downstreamsequencing to the Genomic Characterization Centers (GCCs) and subsequentlyharmonized by the Genomic Data Commons (GDC). This involves monitoringhundreds of samples through the molecular characterization pipeline, trackinghistopathology and nucleic acid metrics, and ensuring sequencing strategies areadjusted as needed. I also collaborate with these teams to identify potentialchallenges in the molecular characterization process and develop resources whichaddress those challenges. I’ve developed standard operating procedures (SOPs) forthe Data Coordinating Center (DCC) to increase the interoperability of model QC dataand the BPC to streamline the nucleic acid isolation process. HCMI models are mostvaluable when they contain comprehensive and standardized datasets and I do the

Page 16: Home > Issue 22 Issue 22 : August, 2019

best I can to ensure that we provide these datasets to the research community.

Figure: Programmatic Oversight of NCI HCMI Molecular Characterization: (A) SampleCollection (Image credit: Integrated Biobank of Luxembourg on Flickr [65] (CC BY-NC-

ND 2.0)), (B) Biospecimen Processing (Image credit: Dr. Cecil Fox, NCI), (C)Sequencing (Image credit: Colleen Dundas, NIAMS (CC BY-NC 2.0)), (D)

Harmonization & Analysis (Image credit: Louis M. Staudt, NCI (CC BY-SA 3.0))

While I am extremely familiar with pediatric and rare disease genomics, the field ofcancer genomics is mostly uncharted territory for me. The HCMI program hasprovided a whole new perspective on learning about genomics. Providing modelsannotated with clinical, genomic and transcriptomic datasets provides researcherswith a crucial resource to address large questions within the field. Study of themodels will advance our basic understanding of tumor heterogeneity, genomicstability, tumor-immune microenvironment and mutational signatures on a morecomprehensive scale. Large repositories of molecularly diverse models from thesame tumor subtype will also allow for greater representation when screening fornovel therapeutic targets as well as assessing drug sensitivities. All of this willcertainly culminate in the advancement of both novel and improved precisiontherapies. It is exciting to be a part of this initiative where the possibilities fordiscovery seem endless.

The human nuclear genome consists of approximately 3.2 billion nucleotides. Anextraordinary amount of biological information is brought together both in preciseorder and time to form the foundation of human life. It’s fascinating and logical, yetpuzzling, all at the same time. It’s certainly what has driven me to pursue a career ingenomics. There are so many ways in which one can work in the field of genomicsand I have been fortunate to work in this field in many capacities. While the careerpath I have taken is varied in its relationship to the field of genomics, onecommonality remains. I work with teams and initiatives who excel at providinganswers to the greater scientific community and the HCMI is no exception.

Back to Top*/

Source URL:https://ocg.cancer.gov/news-publications/e-newsletter-issue/issue-22

Links[1] https://ocg.cancer.gov/printpdf/taxonomy/term/593 [2] https://satijalab.org/costpercell [3]https://pubmed.ncbi.nlm.nih.gov/26618723/ [4] https://pubmed.ncbi.nlm.nih.gov/30483257/ [5]https://pubmed.ncbi.nlm.nih.gov/30872643/ [6] https://pubmed.ncbi.nlm.nih.gov/30591526/ [7]https://pubmed.ncbi.nlm.nih.gov/28821273/ [8] https://pubmed.ncbi.nlm.nih.gov/31024627/ [9]

Page 17: Home > Issue 22 Issue 22 : August, 2019

https://pubmed.ncbi.nlm.nih.gov/30567574/ [10] https://pubmed.ncbi.nlm.nih.gov/29941873/ [11]https://pubmed.ncbi.nlm.nih.gov/26887813/ [12] https://pubmed.ncbi.nlm.nih.gov/28931656/ [13]https://www.cancer.gov/research/areas/treatment/pmi-oncology [14]https://ocg.cancer.gov/programs/HCMI [15] https://portals.broadinstitute.org/ccle [16]https://portals.broadinstitute.org/ctrp/ [17] https://www.cancer.gov/publications/dictionaries/cancer-terms/def/mutation [18] https://www.cancer.gov/publications/dictionaries/cancer-terms/def/in-vivo [19]https://www.ncbi.nlm.nih.gov/gene/4851 [20] https://pubchem.ncbi.nlm.nih.gov/compound/Nicotinamide[21] https://www.cancer.gov/publications/dictionaries/cancer-terms/def/epithelial [22]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/adenoma [23]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/adenocarcinoma [24]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/barrett-esophagus [25]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/ex-vivo [26]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/chemosensitivity [27]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/methylation [28]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/esophageal-cancer [29]https://www.ncbi.nlm.nih.gov/gene/5133 [30] https://www.cancer.gov/publications/dictionaries/cancer-terms/def/antibody [31] https://hcmi-searchable-catalog.nci.nih.gov/ [32] https://www.atcc.org:443/hcmi[33] https://portal.gdc.cancer.gov/projects/HCMI-CMDC [34] https://pubmed.ncbi.nlm.nih.gov/30061675/[35] https://pubmed.ncbi.nlm.nih.gov/21889923/ [36] https://pubmed.ncbi.nlm.nih.gov/29224780/ [37]https://pubmed.ncbi.nlm.nih.gov/26080861/ [38] https://pubmed.ncbi.nlm.nih.gov/25557080/ [39]https://pubmed.ncbi.nlm.nih.gov/29853643/ [40] https://pubmed.ncbi.nlm.nih.gov/29853607/ [41]https://pubmed.ncbi.nlm.nih.gov/25932045/ [42] https://pubmed.ncbi.nlm.nih.gov/27135056/ [43]https://pubmed.ncbi.nlm.nih.gov/30550791/ [44] http://genepattern-notebook.org/ [45]https://cancer.sanger.ac.uk/cell_lines [46] https://www.cancerrxgene.org/ [47]https://ocg.cancer.gov/sites/default/files/GenePattern-Notebook_Figure-1_Updated-Image.png [48]https://ocg.cancer.gov/sites/default/files/GenePattern-Notebook_Combined-Image_v5.png [49]http://www.gparc.org [50] https://hca.genepattern.org/ [51] https://genepattern-notebook.org/tutorials/[52] mailto:[email protected] [53] https://pubmed.ncbi.nlm.nih.gov/28822753/ [54]https://pubmed.ncbi.nlm.nih.gov/16642009/ [55] https://pubmed.ncbi.nlm.nih.gov/20810601/ [56]https://pubmed.ncbi.nlm.nih.gov/29331675/ [57] https://pubmed.ncbi.nlm.nih.gov/29206104/ [58]https://ocg.cancer.gov/programs/hcmi/resources [59] https://ocg.cancer.gov/ [60]https://wiki.nci.nih.gov/display/caDSR/caDSR+Wiki [61] https://docs.gdc.cancer.gov/Data_Dictionary/ [62]https://ocg.cancer.gov/programs/hcmi/hcmi-organization [63]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/793860 [64]https://www.cancer.gov/publications/dictionaries/cancer-terms/def/339339 [65]https://www.flickr.com/photos/ibbl/6841093489/in/photolist-TkaJb-24BUiLx-bqwoKK-LhW5pn-qkVX7-24Fsc92-qkVxq-HHNAk4-ouxCn5-bAUDQC-odiJRQ-ovpXZS-bAUDSy-ZJSyv-od7Wcf-odnfPY-2ffWbmV-2dU8RmK-T63h5m-24BUiKk-RuLSXR-24BUiKR-PSj1DF-wSjZRe-uWZjuj-xhq359-x3SVq2-ovvhd5-xvkPZ9-w6ZU8B-of1r9L-xCuE72-tnC8Af-xiqcVr-xqULou-x371GG-w5tN73-xDzA2S-xDzznL-xBTtjK-x1kPW4-wNcFN9-x1A5Vw-wL2ZYs-v9d1c2-tE5khh