rphenoscape: connecting the semantics of evolutionary morphology to comparative phylogenetics

24
Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics Hilmar Lapp (Duke University) Hong Xu (Duke University) Jim Balhoff (RTI, Inc.) Evolution Meetings 2016, Austin, TX

Upload: hilmar-lapp

Post on 16-Apr-2017

149 views

Category:

Science


3 download

TRANSCRIPT

Rphenoscape:Connecting the semantics of evolutionary morphology to comparative phylogenetics

Hilmar Lapp (Duke University) Hong Xu (Duke University)

Jim Balhoff (RTI, Inc.)

Evolution Meetings 2016, Austin, TX

RPhenoscape• A package for accessing the

Phenoscape Knowledgebase from within R programs

• Programmatic access to: • Evolutionary character data with

computable semantics • Machine-reasoning with

computable phenotype data

R features a rich ecosystem for comparative phylogenetics

CRAN Task View on Phylogenetics and

Comparative Methods at last count lists 76 packages.

Comparative analysis needs comparative data

Magee et al (2014), PLOS ONESee also Drew et al (2013), PLOS Biology; Stoltzfus et al (2012), BMC Research Notes

The lack of reusable digital data is amplified for morphology

Which matrix are we criticizing?

RC07 published their character list (their appendix 2) and their matrix (appendix 3). We also have a hitherto unpublished NEXUS file (presented here as part of Data S3), most likely sent by [M.R.] to [D.G.] in late 2007 or early 2008, which purports to contain the same matrix. Surprisingly, the character list in the paper and that in the file do not agree on the identities of characters 132–134.

Marjanović and Laurin (2015) Reevaluation of the largest published morphological data matrix for phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 3:e1596v1

Morphology is more complex than discrete, disjoint, independent

We know more about morphology than authors state

Implied knowledge can be substantial

Asserted

Inferred

Missing

Digit presence/absence; Sarcopterygii

kb.phenoscape.org

Makemorphologycomputable,discoverable,&linkedtogeneticdata

9

PhenoscapeKnowledgebase

❖ 4,399 taxa (vertebrates)

❖ 139 publications (matrices)

❖ 19,024 character states

❖ 651,660 phenotype annotations

Morphological matrices

Annotation

Ontologies

anatomy

quality

taxonomy

Phenex software(Balhoff et al., 2010)

Phenoscape Knowledgebase

Machine reasoner (OWL)

KB Interface for humans

KB Interface for machines

KB Interface for machines

RPhenoscapeAn R package for API access to the Phenoscape Knowledgebase • Evolutionary character data with computable

semantics • Machine-reasoning with computable

phenotype data: • Synthetic supermatrix synthesis • Semantics-based character and state filtering • Semantic similarity-driven querying and

synthesis

Use-case: Querying studies by morphology and taxonomy

> slist <- pk_get_study_list(taxon = "Ictaluridae", entity = "pectoral fin")

> slist[,"label"]

Source: local data frame [10 x 1] label <chr> 1 Bockmann, F. A. (1998) 2 Chen, X. (1994) 3 De Pinna, M. C., Ferraris, C. J. J., & Vari, R. P. (2007) 4 Fink, S. V, & Fink, W. L. (1981); Fink, S. V, & Fink, W. L. (1996) 5 Kailola, P. J. (2004) 6 Lundberg, J. G. (1992) 7 Mo, T. (1991) 8 Royero, R. (1999) 9 Vigliotta, T. R. (2008) 10 Wiley, E.O., and Johnson, G.D. (2010) >

Use-case: Querying studies by morphology and taxonomy> nex_list <- pk_get_study_xml(as.matrix (slist[2:3,c("id")])) ....This might take a while.... https://scholar.google.com/scholar?q=hylogenetic+studies+of+the+amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with+species+accounts&btnG=&hl=en&as_sdt=0%2C42 Parse NeXML.... http://dx.doi.org/10.1111/j.1096-3642.2007.00306.x Parse NeXML.... > nex_list[[1]] A nexml object representing: 0 phylogenetic tree blocks, where: block 1 contains NULL phylogenetic trees block 0 contains phylogenetic trees 155 meta elements 1 character matrices 53 taxonomic units Taxa: Pseudobagarius leucorhynchus, Liobagrus obesus, Hypsidoris farsonensis, Erethistes sp. (Chen 1994), Bunocephalus amaurus, Xyliphius sp. (Chen 1994) ...

Heavy-lifting of NeXML parsing is done by RNeXML

Use-case: Synthesize presence-absence matrix

> nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine") > m <- pk_get_ontotrace(nex) > m[1:10,] ## Source: local data frame [15 x 5] ## ## taxa otu ## (chr) (chr) ## 1 Ameiurus brunneus VTO_0036273 ## 2 Ameiurus catus VTO_0036275 ## 3 Ameiurus melas VTO_0036272 ## 4 Ameiurus natalis VTO_0036274 ## 5 Ameiurus nebulosus VTO_0036278 ## 6 Ameiurus platycephalus VTO_0036276 ## 7 Ameiurus serracanthus VTO_0036277 ## 8 Ictalurus australis VTO_0061495 ## 9 Ictalurus balsanus VTO_0036221 ## 10 Ictalurus dugesii VTO_0061497 ## Variables not shown: otus (chr), anterior dentation of pectoral fin spine ## (int), anterior distal serration of pectoral fin spine (dbl)

Use-case: Filter matrix using semantics

> is_desc <- pk_is_descendant('Ictalurus', m$taxa)

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE ## [12] TRUE TRUE TRUE TRUE

# pk_is_ancestor() also available (and pk_is_extinct()) # # This is in development for characters, too: # pk_is_descendant(‘jaw skeleton', m$chars, # relationships = ‘part of’)

Current limitationsPackage is not on CRAN (yet), need to install from Github:

Data in Phenoscape concentrated on vertebrates, and skeletal fin-limb characters.

Semantics-driven matrix synthesis currently limited to presence-absence characters.

library(“devtools”)install_github(“xu-hong/rphenoscape”)

SummaryPhenoscape KB has an API for machine access to computable morphology data and computational semantics services.

RPhenoscape is a bridge between this API and the ecosystem of comparative phylogenetics packages in R.

Translates between R user (who uses labels, data matrices) and Phenoscape KB API (which uses identifiers, ontology terms, NeXML, etc).

Code on Github: https://github.com/xu-hong/rphenoscape

Reproducible data integration: The rOpenSci ecosystem

AcknowledgementsU.S. National Science Foundation DBI-1062404, DBI-1062542

National Evolutionary Synthesis Center (NESCent), NSF #EF-0905606

Phenoscape contributors, Advisory Board, Data sources (see: http://phenoscape.org/wiki/Acknowledgments)

RNeXML developers (C. Boettiger, S. Chamberlain) http://github.com/rOpenSci/RNeXML

Get in touch

Repo: github.com/xu-hong/rphenoscape

Github: github.com/hlapp

Twitter: @hlapp