dan crichton april 2010. topics introduction – who am i? architecture – what is means to me...

Dan Crichton

April 2010

TopicsIntroduction – who am I?Architecture – what is means to meChallenges in Developing ArchitecturesReference Architecture vs Domain Specific

Software ArchitecturesExperience in ScienceLessons LearnedQ&A

Who am I?Employed by Jet Propulsion Laboratory since 1995; prior software

engineering positions at Hughes Aircraft Company and in private industry

MS in Computer Science, USC; 20+ years of experience

Program Manager & Principal Computer Scientist for Planetary Data System Engineering in Solar System Exploration Directorate Data Systems and Technology in Earth and Technology Directorate

Principal Investigator for Informatics Center, Early Detection Research Network, National Cancer Institute Facilitating Integration of NASA and Earth System Grid, NASA Object Oriented Data Technology

Several co-Investigator Tasks

Architecture: why do I care?Architecture is a game changer in our business

Enable scientific discovery, novel engineering, etcCoordination across multiple enterprises

Data system costs per mission, project, investigation, etc is high

Technology infusion is limited

Experience and knowledge reuse

But, there are challengesLack of true architects

Most think of point solutions or confuse architecture and implementation

Abstracting is difficult

Governance is often at a project level; little view at an enterprise level

Limited planning and understanding of the reference requirements

Architects: what are they?Effective Architects have…

• Years of experience

• Holistic view of domain – Look at both aesthetics and

practical details– Variable technical depth

• Lifecycle roles– Strong involvement up-front– May oversee development– Chooses stable steps in

development

Effective Architects are not…

• Lone inventors or scientists– The architect is a good

communicator and politician -- architectures must be sold and explained and their integrity maintained

– Architecting is not a science, but depends on science

• Purely technologists• Architecture is a strategy

• “Top level only” designers– Details are often critical

• Collaborators– A coherent vision is critical;

they drive it

Architecture: what is it?The fundamental organization of a system

embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. (ANSI/IEEE Std. 1471-2000)

Communicating an architectureA good architecture is one that can be

communicated to the stakeholders

A good architecture presents viewpoints of the system that address stakeholder concerns

A good architecture uses models and descriptions that are relevant to the stakeholdersDifferent models may be used to present different

viewpoints (e.g., A UML model of the system may be appropriate for some but not all stakeholders)

9

• A viewpoint is a template for constructing a view• Enterprise, Functional,

Informational, etc• A view is a description of

the entire system from the perspective of a set of related concerns. A view is composed of one or more models.

• A model is an abstraction or representation of some aspect of a thing

• Examples: RM-ODP, FEAF, TOGAF, etc

The viewpoint is where you look from

The view is what you see

(Project Managers, Engineers, Scientists, Business Analysts, …)

Reference ArchitecturesShow components, functions, and interfaces at a high

level of abstractionsLikewise, we consider information models to also be

part of a reference architecture (at a sufficient abstract level)In observing systems, the information model patterns

are highly compatible as a reference information modelImplementation neutral; architectural frameworks

can be useful in defining a structure for a reference architecture

We use Reference Architectures to give us a strategic advantage as well as improve enterprise scale software

Domain Specific Software Architectures*Domain model

Leverage experts who have the “holistic” view and can drive the need for product lines

An unambiguous view is critical (in fact, this has been a problem in science arenas)

Reference requirements Drives the reference architecture However, it is critical to map domain models to reference requirements

in order to understand the solution spaceReference architecture

Satisfies an abstracted set of functions from the reference requirements

It’s engineered for the “ilities” reusability, extensibility and configurability

It demonstrates the separation of functional elements of the architecture

* Tracz, Will, Domain-Specific Software Architecture, ACM SIGSOFT, 1995

RAs vs DSSAs in Science

In science data systems, construction of multiple architecture viewpoints of a system is criticalProcess/EnterpriseInformation/DataTechnology

We find the “viewpoints” are similar, but models can be domain specificThis is the opportunity to develop a reusable

reference architecture if the “patterns” can be extracted

Scientific data systemsCovers a wide variety of disciplines

Solar system exploration AstrophysicsEarth scienceBiomedicineetc

Each has its own communities, standards and systems

But, there is an underlying reference architecture and discipline software architectures in each!

The “e-science” trendHighly distributed, multi-organizational systems

Systems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environments

Sharing of data and services which allow for the discovery, access, and transformation of data Systems are moving towards publishing of services and data in order to address

data and computationally-intensive problems Infrastructures which are being built to handle future demand

Address complex modeling, inter-disciplinary science and decision support needs Need a dynamic environment where data and services can be used quickly as the

building blocks for constructing predictive models and answering critical science questions

Changing the way in which data analysis is performed Moving towards analysis of distributed data to increase the study power Enabling greater collaboration across centers

DJC-15

External Science

Community

Data Acquisition

and CommandMission

OperationsInstrument /Sensor Operations

ScienceData

Archive

ScienceData

Processing

Data Analysis and

Modeling

Science Information Package

Science Team

Relay Satellite

Spacecraft / lander

Spacecraft andScientific Instruments

Primitive Information Object

Primitive Information Object

Simple Information Object

Telemetry Information Package


Instrument Planning

Information Object


Science Products - Information Objects

PlanningInformation

Object


• Common Meta Models for Describing Space Information Objects• Common Data Dictionary end-to-end

Science Processing Center

1

Science Processing Center

2

Archive & Distributi

on(DAAC 1)

Archive & Distributi

on(DAAC 2)

Distributed Data Analysis(Subsetting,

Gridding,Transformation,Modeling)

Other Data

Sources (e.g.

NOAA)

DS Mission #1

DS Mission #2 Users

SMAP, Desdyni

PO.DAAC

Infrastructure to supportAnalysis of Distributed Data

Patterns in scientific data systemsInstrument and Spacecraft CommandsInstruments that capture observationsGeneration of Engineering and Science Data

ProductsData ProcessingData ManagementData DistributionDistributed FacilitiesData Movement

• Simple SOA-style pattern

• Data/Information Architecture

• Components, middleware, and communication

• NOTE: Process is implicit here

Middleware andMessaging

Comm Layer

Metamodel

InformationComponents

InformationObject

Domain Model

Metamodel

InformationComponents

InformationObject

Domain Model

Middleware andMessaging

Comm LayerCommon Protocols - TCPIP, ...

Common Messaging - SOAP, JMS, ...

Common Functions - Registry, Repository, ...

Common or Mediated Metamodel - DEDSL,ISO1179, UML

Common or Mediated Domain Models --Planetary Data Systems, EOSDIS, ...

Information Exchange - Science, Mission, etc, DataProducts, Observations, SLE Objects, ...

Communications

Software/Application

DataArchitecture/Content

DJC-20

Usability

Diversity within the domain

Scalability

Reliability

Portability

NOTE: Our reference architecture must address these ilities long term

Cumulative Volume of L2+ Products at All DAACs

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

FY00 FY01 FY02 FY03 FY04 FY05 FY06 FY07 FY08 FY09 FY10 FY11 FY12 FY13 FY14

Fiscal Year

Cumulative Volume (TB)

Specialization within domainsDomain information models

Planetary Science OntologyCancer Biomarker OntologyEtc

Specific services and domain implementations are derived from the reference architectureReference Architecture->Domain Specific Software

Architecture-> Domain Implementations

In these science domains, the architectures need to be long-lived (20+ years)

Software product linesThis is about strategy more than technology

Goal is a software product line thatImplements our reference architectureAllows for construction of core software

components that can be reused across projects and science disciplines

Can demonstrate sufficient cost and schedule benefits without sacrificing flexibility in meeting requirements and adapting to technology change

Extensions can be applied at the discipline level

Object Oriented Data Technology• Represents both a reference

architecture AND a software product line for science data systems

• Exploits common patterns• Delivers reusable software

components as building blocks for construction of higher order data systems

• Applied to multiple science disciplines

• Funded originally back in 1998; runner up for NASA Software of the Year in 2003

• Heavily used by NASA and NIH projects

OODT/Science Web Tools

OODT/Science Web Tools

ArchiveClient

OBJ ECT ORIENTED DATA TECHNOLOGY FRAMEWORK

ProfileXMLData

ProfileXMLData

NavigationService

NavigationService

Data System

2

Data System

2

Data System

1

Data System

1

Other Service 1

Other Service 1

Other Service 2

Other Service 2

QueryServiceQuery

ServiceProductServiceProductService

ProfileServiceProfileService

ArchiveServiceArchiveService

Bridge to External Services

Bridge to External Services

DJC-24

Architectural principles*Separate the technology and the information architectureEncapsulate the messaging layer to support different messaging

implementationsEncapsulate individual data systems to hide uniquenessProvide data system location independence Require that communication between distributed systems use

metadataDefine a model for describing systems and their resources Provide scalability in linking both number of nodes and size of data

setsAllow systems using different data dictionaries and metadata

implementations to be integratedLeverage existing software, where possible (e.g., open source, etc)`

DJC-25

* Crichton, D, Hughes, J. S, Hyon, J, Kelly, S. “Science Search and Retrieval using XML”,Proceedings of the 2nd National Conference on Scientific and Technical Data, National Academy of Science, Washington DC, 2000.

Architectural focusConsistent distributed capabilities

Resource discovery (data, metadata, services, etc), “grid-ing” loosely coupled science system, workflow management

On-demand, shared services (E.g. processing, translation, etc) Processing Translation

Deploy high throughput data movement mechanisms

End-to-end capabilities across the science environment

Reduce local software solutions that do not scale Increasing importance in developing an “enterprise” approach with

common services

Build value-added services and capabilities on top of the infrastructure

DJC-26

Exploiting common patternsHow data is managed (registry/repository,

information objects themselves)…How data is generated, captured, etc (e.g.,

workflow and data processing)…How data is accessed (metadata, data)…How information is discovered …How data is distributed (e.g., transformed)…How data is visualized…

What does OODT do? Tie together loosely coupled distributed heterogeneous data

systems into a virtual data grid

Support critical functions Data Production and workflow Data Distribution Data Discovery (including query optimization across highly distributed

systems) Data Access

An architectural approach first, an implementation second Adapt to different distributed computing deployments Promotes a REST-style architectural pattern for search and retrieval

Scalability in linking together large, distributed data sets

OODT data architecture focus

On types of and relationships among a software system’s data

Decomposition of data within a software system to its logical components and interactions

Components: Data Elements, Data Dictionary, Data Models of individual data sources

Interactions: Mappings between Data Dictionary to Data Models, Data Element structural comparison

Some standards currently exist for data architecture ISO: ISO-11179 Standardization and Specification of Data Elements Dublin Core Metadata Initiative: Dublin Core Data Elements to describe any

electronic resource

Specifications for the Data Architecture Common XML schema for managing information about data

resources Common XML schema for messaging between distributed services Methods for integrating existing domain models within architecture

ProfileAttributes

-id: String-version: String-statusID: String-securityType: String-parent: String-children: List-regAuthority: String-revisionNotes: List-dataDictID: String

ProfileAttributes

-id: String-version: String-statusID: String-securityType: String-parent: String-children: List-regAuthority: String-revisionNotes: List-dataDictID: String

ResourceAttributes

-identifier: String-title: String-formats: List-description: String-creators: List-subjects: List-publishers: List-contributors: List-dates: List-sources: List-languages: List-coverages: List-rights: List-contexts: List-aggregation: String-clazz: String-locations: List

ResourceAttributes

-identifier: String-title: String-formats: List-description: String-creators: List-subjects: List-publishers: List-contributors: List-dates: List-sources: List-languages: List-coverages: List-rights: List-contexts: List-aggregation: String-clazz: String-locations: List

ProfileElement

-name: String-id: String-desc: String-type: String-unit: String-synonyms: List-obligation: boolean-maxOccurrence: int-comments: String

ProfileElement

-name: String-id: String-desc: String-type: String-unit: String-synonyms: List-obligation: boolean-maxOccurrence: int-comments: String

EnumeratedProfileElement

-values: List

EnumeratedProfileElement

-values: List

RangedProfileElement

-min: double-max: double

RangedProfileElement

-min: double-max: double

ProfileProfile

UnspecifiedProfileElement

UnspecifiedProfileElement

MapMap

resourceAttributesprofileAttributes

elements1 1

1

1 11

*

profile profile

Keys areStrings,equal toelements’names

Resource Metadata Model

Request/Response Model

Based on ISO/IEC 11179

Based on Dublin Core

XMLQuery

-resultModeId: String-propogationType: String-propogationLevels: String-maxResults: int-kwqString: String-numResults: int-mimeAccept: List

XMLQuery

-resultModeId: String-propogationType: String-propogationLevels: String-maxResults: int-kwqString: String-numResults: int-mimeAccept: List

QueryHeader

-id: String-title: String-description: String-type: String-statusID: String-securityType: String-revisionNote: String-dataDictID: String

QueryHeader

-id: String-title: String-description: String-type: String-statusID: String-securityType: String-revisionNote: String-dataDictID: String

QueryResult

-list: List

QueryResult

-list: List

QueryElement

-role: String-value: String

QueryElement

-role: String-value: String

1

1

1

1

1

1

1

fromSet

selectSet

whereSet

resultqueryHeader

nasa.pds.xmlquery

OODT software componentsProfile Service – A server-based registry that is

able to either serve local XML profiles or plug-into an existing catalog. This component provides resource discovery.

Product Service – A server component that plugs into existing repositories and serves products. This includes translation serves, etc

Catalog and Archive Service – Transaction-based server that catalogs and archives products providing profile and product servers for discovery and distribution

Query Service – Provides query management across distributed services to enable discovery.

DJC-32

3. Repositories for storing and retrieving many types of data

1. Science data tools and applications use “APIs” to connect to a virtual data repository

Visualization Tools

Analysis Tools

OODTReusable

DataGrid

Framework

OODTReusable

DataGrid

Framework

MissionData

Repositories

MissionData

RepositoriesOODT

API

OODTAPI

2. Middleware creates thedata grid infrastructure connecting distributed heterogeneous systems and data

BiomedicalData

Repositories

BiomedicalData

Repositories

EngineeringData

Repositories

EngineeringData

Repositories

Web Search Tools

OODTAPI

OODTAPI

OODTAPI

OODTAPI

• Common Meta Models for Describing Space Information Objects• Common Data Dictionary end-to-end

Query Integration

Node 1Profile Server

XML Request

Information Object

XML Request

Info

Ob

ject

XM

L R

eque

st

Repository Product Server

Information Object

Web I/F

Desktop I/F

XML Request

Information Object

Name Server

Repository Product Server



Registry Server

Repository/ArchiveServer

…

Name ServerService Registry

XML Request

Information Object

WSDL WSDL

ProductCatalogs

Science Products

ScienceProducts

Science Products

OODT software implementation OODT is Open Source Developed using open source software (i.e. Java/J2EE and XML) Implemented reusable, extensible Java-based software components

Core software for building and connecting data management systems Provided messaging as a “plug-in” component that can be replaced

independent of the other core components. Messaging components include: CORBA, Java RMI, JXTA, Web Services, etc REST seems to have prevailed

Provided client APIs in Java, C++, HTTP, Python, IDL Simple installation on a variety of platforms (Windows, Unix, Mac OS X,

etc) Used international data architecture standards

ISO/IEC 11179 – Specification and Standardization of Data Elements Dublin Core Metadata Initiative W3C’s Resource Description Framework (RDF) from Semantic Web Community

DJC-34

EDRN Knowledge Environment EDRN has been a pioneer in the use of

informatics technologies to support biomarker research

EDRN has developed a comprehensive infrastructure to support biomarker data management across EDRN’s distributed cancer centers

Twelve institutions are sharing data Same architectural framework as planetary

science

It supports capture and access to a diverse set of information and results

Biomarkers Proteomics Biospecimens Various technologies and data products

(image, micro-satellite, …) Study Management

DJC-35

DJC-37

• Often unique, one of a kind missions– Can drive technological changes

• Instruments are competed and developed by academic, industry and industrial partners

– Highly distributed acquisition and processing across partner organizations

– Highly diverse data sets given heterogeneity of the instruments and the targets (i.e. solar system)

• Missions are required to share science data results with the research community requiring:

– Common domain information model used to drive system implementations

– Expert scientific help to the user community on using the data

– Peer-review of data results to ensure quality– Distribution of data to the community

• Planetary science data from NASA (and some international) missions is deposited into the Planetary Data System

Planetary Data SystemDistributed Planetary Science Archive

Small Bodies NodeUniversity of Maryland

College Park, MD

Planetary Plasma Interactions NodeUniversity of California Los AngelesLos Angeles, CA

Geosciences NodeWashington University

St. Louis, MOImaging NodeJPL and USGSPasadena, CA and Flagstaff, AZ

THEMIS Data NodeArizona State UniversityTempe, AZ

Central NodeJet Propulsion LaboratoryPasadena, CA

Navigation Ancillary Information NodeJet Propulsion LaboratoryPasadena, CA

Rings NodeAmes Research CenterMoffett Field, CA

Atmospheres NodeNew Mexico State UniversityLas Cruces, NM

Other Data

Systems

CatalogsCatalogs

Distributed Data Analysis

AirborneInstruments

Local Storage(Models, Data, etc)Local Storage

(Models, Data, etc)

Multi-missionPolicies &

Rules

Multi-missionPolicies &

Rules

Data Acquisition/Inges

tion

Special ProductProcessing Environment /

Computational Infra

Web Portal

Data Production/Proce

ssing

Data Integration

Modelingand VisualizationFacility

Surface Instruments

(Testbed and Operational

DeployedEnvironments)

Application to Climate Research

Highly distributed modeling and observational systems

Heterogeneous implementations

Different purposesBut, brought together

as a virtual system, provides new science discovery opportunities (Observations) (Models)

Lessons LearnedA reference architecture is critical for driving a

strategy and support large-scale/enterprise systemsHowever, limited experience in organizations to build

reference architecturesUseful ways to represent the architecture can be

tough!How detailed to make the reference architecture is

an art! (Don’t let the implementation drive the RA)

Products lines are useful to providing reusable components based on the reference architecture

More Lessons Learned….Distributed service architectures

Not anything new (my experience with them goes back to the early 1990s)

But, often, newer technologies and approaches are seen as a panacea

Technology is not a replacement for a conceptual architectureMy experience is that definition of the architecture

independent of technology is critical The goal should be stability in the architecture model; the

selection of appropriate technology will change over timeThis is why an architect is much more of a strategist than a

technologist

Final ThoughtsSoftware architecture in science is critical to

Reducing cost of building science data systemsBuilding virtual organizationsConstructing software product linesDriving standardsSupporting new paradigms in mission operations and

scientific research

Science is still learning how to best leverage technology in a collaborative discovery environment, but significant progress is being made!

Resources (1) Tracz, Will. Domain-Specific Software Architecture. ACM

SIGSOFT, 1995.

(2) D. Crichton, S. Kelly, C. Mattmann, Q. Xiao, J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey, S. Srivastava, L. Esserman, and B. Bigbee. A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing, pp. 44, Amsterdam, the Netherlands, December 4th-6th, 2006.

(3) C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721-730, Shanghai, China, May 20th-28th, 2006.

EDRN’s Ontology Model EDRN has developed a High level ontology

model for biomarker research which provides standards for the capture of biomarker information across the enterprise

Specific models are derived from this high level model

Model of biospecimens Model for each class of science data

EDRN is specifically focusing on a granular model for annotating biomarkers, studies and scientific results

EDRN has a set of EDRN Common Data Elements which is used to provide standard data elements and values for the capture and exchange of data

DJC-46EDRN Biomarker Ontology Model

EDRN CDE Tools

dan crichton april 2010. topics introduction – who am i? architecture – what is means to me...

Documents

good architecture

technologists architecture

stakeholders slide

qa slide

entire system

earth system grid

knowledge reuse slide

view enterprise