franz 2016 phenotype rcn representing taxonomy and phylogeny as logically tractable variables

1
The evolving synthesis chain Our human-made taxonomies and phylogenies intended to hierarchically represent organismal lineage identities and relationships – are not a constant over time. At any time [1], the prospect of a comprehensive tree of life synthesis is immense-ly motivating to research communities advancing comparative phenomic knowledge. But the terms "tree of life" and "synthesis" are also capable of deceiving us. They focus us psychologically on the here and now, and may therefore obscure the need to build information systems that are (1) capable of representing multiple, incremental stages along the complex path towards 'the synthetic tree of life' (→ representation), and which (2) entail semantic linkages among the frequently divergent signals that each stage can emit (→ reasoning). We may attempt to look deeper into the past and future of systematic inference generation, and ask: "how well is our present 'synthesis' aligned with one that we will have in 25 years from now?" This question is relevant to knowledge integration and reproducible science. If we look 25 years into the past (ca. 1990), we can gauge its significance. Designing for the failure to refer Speaking provocatively, but also with an eye on sustainable information management design, our emerging environments are not necessarily 'of taxa' or 'of phyla'. They are more immediately of human theories about these purported evolutionary entities. Even well into the 21 st century, our hierarchical theories are consistently expanded, reconfirmed, or partly rejected and revised, sometimes in dramatic fashion, to approximate the tree of life better than before. It is a personal and social challenge to counter-act the allure of an 'almost-within-reach' synthesis, and instead recognize Representing taxonomy and phylogeny as logically tractable variables Nico M. Franz School of Life Sciences, Arizona State University URL: https://biokic.asu.edu ; E-mail: [email protected] Figs. 1 & 2. (1) Above: Toolkit workflow schema. T 1 , T 2 = input trees, A = RCC-5 articulations, C = additional tree constraints, MIR = Maximally Informative Relations. (2) Right: Abstract toolkit example, with user- provided input and inferred output alignment visualizations. Source: [7]. Acknowledgments Thanks to the Euler/X logic team: Shawn Bowers, Tuan Dang, Parisa Kianmajd, Bertram Ludäscher (PI), Timothy McPhillips & Shizhuo Yu; and the ETC team: Hong Cui (PI), James Macklin & Thomas Rodenhausen. This research is supported through the grants: NSF DEB–1155984, DBI–1342595 (Franz); and IIS–118088, DBI–1147273 (Ludäscher). References [1] Haeckel. 1866. Generelle Morphologie der Organismen. doi: 10.5962/bhl.title.3953 [2] Atran. 1998. Folk biology and the anthropology of science: cognitive universals and cultural particulars. doi: 10.1017/S0140525X98001277 [3] Euler/X project @ GitHub: https://github.com/EulerProject/ EulerX [4] Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. http://arxiv.org/abs/ 1402.1992 [5] Franz et al. 2016. Two influential primate classifications logically aligned. To appear in Systematic Biology. http://arxiv.org/abs/ 1412.1025 [6] Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. To appear in Semantic Web Journal. doi: 10.3233/SW-160220 Aligning evolving syntheses In the Euler/X project [3], we are developing novel logic services and use cases that demonstrate the feasibility of managing the taxonomic and phylogenetic variables in open- ended systems. The application is more fully described in [4]. We represent taxonomic concepts and leverage Region Connection Calculus (RCC-5) articulations, in combination with off-the-shelf and custom Answer Set Programming and RCC reasoners, to achieve logically consistent and well-specified alignments of semantically heterogeneous taxonomies and phylogenies (Figs. 1 & 2). The alignments are intentionally not 'objective', but instead reflect one or more systematic experts' subjective and purpose-driven, but logically explicit, perspectives on how to integrate across succeeding meaning hierarchies. The resulting alignment visualizations are taxonomic and phylogenetic meaning transition maps. One direct benefit of inferring such maps is that the reliability of taxonomic names and of phyloreferences can be quantified through the semantics of RCC-5 articulations (Table 1; Fig. 4). Pathways to system implementation We have successfully applied this approach to align multiple classifications of primates (Fig. 3) [5], alternative species-level taxonomies of grasses [6], succeeding cladistic and revisionary inferences of weevils [7,8], and competing avian order- level phylogenomic hypotheses (in prep.). While the logic optimization research remains ongoing, the RCC- 5 multi-taxonomy/phylogeny alignment approach appears ready for implemen-tation into open- ended biodiversity or evolutionary knowledge systems. This will provide needed input on conceptual and practical challenges, and on the value of the novel semantic Annual Summit of the Phenotype Research Coordination Network – 'Complex Data Integration' Biosphere2, February 26–28, 2016 1 2 Fig. 3. Visualization of the consistent, well-specified Cheirogaleiodae sec. Groves (2005) (T 2 ) / Cheirogaleidae sec. Groves (1993) (T 1 ) alignment. Source: [5]. Table 1. Name:meaning cardinality relations in the primate use case – only 56.4% name pairs are 'reliable'. Source: [5]. Fig. 4. ProvenanceMatrix visualization of Maximally Informa-tive Relations in the Minyomerus use case. Sources: [8,9]. Fig. 5. Semi-realistic example of using Euler/X RCC-5 alignments to represent evolving relationships of specimen, phenomic, and taxonomic concept information. Two floristic treatments (1993, 1997) have overlapping sets of examined herbarium specimens, where each specimen is variously assigned to treatment- specific phenomic traits and taxonomic concepts. These lower-level RCC-5 articulations logically 'propagate up' to integrate at higher levels. 3 4 5

Upload: taxonbytes

Post on 16-Apr-2017

432 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tractable Variables

The evolving synthesis chain Our human-made taxonomies and phylogenies – intended to hierarchically represent organismal lineage identities and relationships – are not a constant over time. At any time [1], the prospect of a comprehensive tree of life synthesis is immense-ly motivating to research communities advancing comparative phenomic knowledge. But the terms "tree of life" and "synthesis" are also capable of deceiving us. They focus us psychologically on the here and now, and may therefore obscure the need to build information systems that are (1) capable of representing multiple, incremental stages along the complex path towards 'the synthetic tree of life' (→ representation), and which (2) entail semantic linkages among the frequently divergent signals that each stage can emit (→ reasoning). We may attempt to look deeper into the past and future of systematic inference generation, and ask: "how well is our present 'synthesis' aligned with one that we will have in 25 years from now?" This question is relevant to knowledge integration and reproducible science. If we look 25 years into the past (ca. 1990), we can gauge its significance.

Designing for the failure to refer Speaking provocatively, but also with an eye on sustainable information management design, our emerging environments are not necessarily 'of taxa' or 'of phyla'. They are more immediately of human theories about these purported evolutionary entities. Even well into the 21st century, our hierarchical theories are consistently expanded, reconfirmed, or partly rejected and revised, sometimes in dramatic fashion, to approximate the tree of life better than before. It is a personal and social challenge to counter-act the allure of an 'almost-within-reach' synthesis, and instead recognize the ephemerality of con-temporary inferences, in order to create sustainable environments. While we may want stable classes and identifiers 'for taxa and phyla' in our systems now, we should build for the possibility of these entities failing to refer to natural entities. We should not trust our 'inductive wants' vis-à-vis current systematic knowledge – which are cognitively constrained [2] – but instead design knowledge transition systems. Building such systems is also a logic representation challenge, and more directly, a data service design challenge. In summary, open-ended environments for comparative phenomic data should be designed to represent taxonomy and phylogeny as logically tractable variables, whose significance on particular comparative inferences can be assessed and reassessed over time and across parallel or succeeding phenomic analyses.

Representing taxonomy and phylogenyas logically tractable variables

Nico M. FranzSchool of Life Sciences, Arizona State University

URL: https://biokic.asu.edu; E-mail: [email protected]

Figs. 1 & 2. (1) Above: Toolkit workflow schema. T1, T2 = input trees, A = RCC-5 articulations, C = additional tree constraints, MIR = Maximally Informative Relations. (2) Right: Abstract toolkit example, with user-provided input and inferred output alignment visualizations. Source: [7].

Acknowledgments Thanks to the Euler/X logic team: Shawn Bowers, Tuan Dang, Parisa Kianmajd, Bertram Ludäscher (PI), Timothy McPhillips & Shizhuo Yu; and the ETC team: Hong Cui (PI), James Macklin & Thomas Rodenhausen. This research is supported through the grants: NSF DEB–1155984, DBI–1342595 (Franz); and IIS–118088, DBI–1147273 (Ludäscher).

References[1] Haeckel. 1866. Generelle Morphologie der Organismen. doi: 10.5962/bhl.title.3953[2] Atran. 1998. Folk biology and the anthropology of science: cognitive universals and cultural particulars. doi: 10.1017/S0140525X98001277[3] Euler/X project @ GitHub: https://github.com/EulerProject/EulerX [4] Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. http://arxiv.org/abs/1402.1992 [5] Franz et al. 2016. Two influential primate classifications logically aligned. To appear in Systematic Biology. http://arxiv.org/abs/1412.1025 [6] Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. To appear in Semantic Web Journal. doi: 10.3233/SW-160220[7] Franz et al. 2015. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE 10(2): e0118247. doi: 10.1371/journal.pone.0118247[8] Jansen & Franz. 2015. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments. ZooKeys 528: 1–133. doi: 10.3897/zookeys.528.6001 [9] Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 14–23.

Aligning evolving syntheses In the Euler/X project [3], we are developing novel logic services and use cases that demonstrate the feasibility of managing the taxonomic and phylogenetic variables in open-ended systems. The application is more fully described in [4]. We represent taxonomic concepts and leverage Region Connection Calculus (RCC-5) articulations, in combination with off-the-shelf and custom Answer Set Programming and RCC reasoners, to achieve logically consistent and well-specified alignments of semantically heterogeneous taxonomies and phylogenies (Figs. 1 & 2). The alignments are intentionally not 'objective', but instead reflect one or more systematic experts' subjective and purpose-driven, but logically explicit, perspectives on how to integrate across succeeding meaning hierarchies. The resulting alignment visualizations are taxonomic and phylogenetic meaning transition maps. One direct benefit of inferring such maps is that the reliability of taxonomic names and of phyloreferences can be quantified through the semantics of RCC-5 articulations (Table 1; Fig. 4).

Pathways to system implementation We have successfully applied this approach to align multiple classifications of primates (Fig. 3) [5], alternative species-level taxonomies of grasses [6], succeeding cladistic and revisionary inferences of weevils [7,8], and competing avian order-level phylogenomic hypotheses (in prep.). While the logic optimization research remains ongoing, the RCC-5 multi-taxonomy/phylogeny alignment approach appears ready for implemen-tation into open-ended biodiversity or evolutionary knowledge systems. This will provide needed input on conceptual and practical challenges, and on the value of the novel semantic integration services afforded by this approach (Fig. 5.). If you are interested in using the Euler/X toolkit and/or in collaborating with us, please contact [email protected]

Annual Summit of thePhenotype Research Coordination Network

– 'Complex Data Integration'

Biosphere2, February 26–28, 2016

1 2

Fig. 3. Visualization of the consistent, well-specified Cheirogaleiodae sec. Groves (2005) (T2) / Cheirogaleidae sec. Groves (1993) (T1) alignment. Source: [5].

Table 1. Name:meaning cardinality relations in the primate use case – only 56.4% name pairs are 'reliable'. Source: [5].

Fig. 4. ProvenanceMatrix visualization of Maximally Informa-tive Relations in the Minyomerus use case. Sources: [8,9].

Fig. 5. Semi-realistic example of using Euler/X RCC-5 alignments to represent evolving relationships of specimen, phenomic, and taxonomic concept information. Two floristic treatments (1993, 1997) have overlapping sets of examined herbarium specimens, where each specimen is variously assigned to treatment-specific phenomic traits and taxonomic concepts. These lower-level RCC-5 articulations logically 'propagate up' to integrate at higher levels.

3

4

5