aip-developer-intro_pag2015

54
araport.org Extending the Arabidopsis Information Portal: A Developer’s Perspective Matt Vaughn Director, Life Sciences Computing Texas Advanced Computing Center

Upload: matthew-vaughn

Post on 18-Jul-2015

101 views

Category:

Science


4 download

TRANSCRIPT

Page 1: aip-developer-intro_pag2015

araport.org

Extending the Arabidopsis

Information Portal: A Developer’s

Perspective

Matt Vaughn

Director, Life Sciences Computing

Texas Advanced Computing Center

Page 2: aip-developer-intro_pag2015

araport.org

Web APIs: Problem Statement

• Lack of web services for legacy data

– There are a lot of web SITES

• Existing web services don’t share information architecture

– Negatively impacts interoperability, discoverability, & usability

• Browser security models are punitively complex

– Hard to build apps integrating multiple sources

Page 3: aip-developer-intro_pag2015

araport.org

Gold standard Data APIs

• Implement REST-like interfaces

• Served over HTTPS (with valid SSL certificate)

• Allow Cross Origin Scripting Support (CORS)

• Require authentication

– Understand and respond to client demographics

– Meter access to services

• Simple controlled vocabulary + metadata for query parameters

• Responses conform to accepted JSON schemas*

• Support future AIP deep caching & mining efforts**

* Except where it makes sense not to

** Based on tech like ElasticSearch or neo4j

Page 4: aip-developer-intro_pag2015

araport.org

Araport Service Architecture

RESTful API @ https://api.araport.org/

CLI clients,

Scripts, 3rd party

applications

Physical

resources

Agave Core

apps

meta

files

profile

jobssystems

ADAMAmanage

enroll

a b c d e f

AIP + 3rd party data

providers

API Types

• Query

• Map*

• Generic

• Pass-through

• Single-sign on

• Metering

• Unified logging

• API versioning

• Automatic HTTPS +

CORS

REST*

CGI

SOAPNew

Web

Services

InterMin

e

Chado &

Tripal

Computing

StorageDatabase

Page 5: aip-developer-intro_pag2015

araport.org

Araport Service Architecture

RESTful API @ https://api.araport.org/

CLI clients,

Scripts, 3rd party

applications

Physical

resources

Agave Core

apps

meta

files

profile

jobssystems

ADAMAmanage

enroll

a b c d e f

AIP + 3rd party data

providers

API Types

• Query

• Map*

• generic

• pass-through

• Single-sign on

• Throttling

• Unified logging

• API versioning

• Automatic

HTTPS

REST*

CGI

SOAPNew

Web

Services

InterMin

e

Chado &

Tripal

Computing

StorageDatabase

Page 6: aip-developer-intro_pag2015

araport.org

Data API Types

Type Inputs Outputs Notes

query AIP parameters

mandatory

AIP-aligned JSON Gold standard

data APIs

map AIP parameters

preferred

Transformed JSON Ideal for

implementing

namespace

transformations or

filters

generic AIP parameters

preferred

Specified within

code but can be any

valid Content-type

Implement return

of non-JSON data

passthrough Specified by remote

service

Specified by remote

service

Allows existing

services to be

discoverable from

AIP data store

Page 7: aip-developer-intro_pag2015

araport.org

Data API Reserved Parameters

Name Description Validator (Case-insensitive)

locus AGI Gene Locus

Identifiers

AT[1-5GM][0-5]{5,5}$

transcript AGI Transcript

Identifiers

AT[1-5GM][0-9]{5,5}.[0-9]{1,3}$

identifier Another string plausibly

expected to identify a

gene or transcript

Valid alphanumeric string. No

whitespace.

chromosom

e

A. thaliana Col-0

chromosome identifiers

CHR[1-5MC]$

start/end Coordinates within Col-0

assembly

Numeric. Should be range-checked.

strand Defines genomic strand [\+\-\.]{1,1}

accession Ecotypes or natural

accessions

Not validated at present

term Generic search term Valid text string. Useful for

implementing full-text search

Page 8: aip-developer-intro_pag2015

araport.org

Rationalized Responses via

lightweight JSON schemas

• Facilitate creation of mash-up client

applications

• Enable extraction and mining of the

Arabidopsis deep web

• Facilitate future interoperability with

semantic web technology without forcing

their adoption

Minimal, machine validated rules for what AIP

responses should look like

Page 9: aip-developer-intro_pag2015

araport.org

curl –skL -XGET -H "Authorization: Bearer

624513772fbc2caf662b9accbf10380"

https://api.araport.org/community/v0.3/aip/resolver_fetch_locus_by_syn

onym_v0.2/search?identifier=URIC_ARATH

{"result":[

{"relationships":[

{"direction":"undirected",

"type":"synonymous_with",

"scores":[

{"confidence":1}]}],

"related_entity":"URIC_ARATH",

"class":"locus_id_mapping",

"locus":"AT2G26230",

"related_entity_kind":"UniProtKB-ID"}],

"metadata":

{"time_in_main":0.020552873611450195},

"status":"success"}

Example Araport JSON (1)

Page 10: aip-developer-intro_pag2015

araport.org

Interacting with Araport APIs (1)

Araport web services publish live, interactive documentation

Page 11: aip-developer-intro_pag2015

araport.org

Interacting with Araport APIs (2)

Araport web services are available in every Javascript console

Data API namespace

Individual Data API

> Agave.api.adama.getNamespaces()

Page 12: aip-developer-intro_pag2015

araport.org

Interacting with Araport APIs (3)

Araport web services power Science Apps!

Page 13: aip-developer-intro_pag2015

araport.org

Creating an Araport Data API (1)

• Decide on a type of Data API to build

• Initialize a local Git repository

• Author a main function (Python only for now)

• Test that it works in your local Python interpreter

• Write a metadata.yml file describing the service

• Push the local repository up to Github*

• Perform an authenticated HTTP POST to the ADAMA service with a link to your repo

• Verify that the service was created successfully

• Test it out via HTTP request* Or any public git server

Principles

A. All development is done on a

local system

B. Almost no software

dependencies beyond standard

system contents

C. Source code is always public

D. Testing via same routes as

usage

E. Easy to iterate if things go awry

2

3

4

5

1. Write code

2. Publish code

3. Register repository

4. Code deployed

5. Use web service

1

Page 14: aip-developer-intro_pag2015

araport.org

Science Apps: Problem Statement

• Technical hurdles for developing web

applications

– Technology selection

– Development and testing environment setup

• The small number of applications that get

built are often not reusable

Page 15: aip-developer-intro_pag2015

araport.org

Apps Infrastructure

Page 16: aip-developer-intro_pag2015

araport.org

Apps Development

• Industry-standard, open-source tooling

– Node.js

– Yeoman

– Grunt

– Bower

• Application generator for quickly

bootstrapping application development

$ yo aip-science-app$ grunt

Page 17: aip-developer-intro_pag2015

araport.org

Page 18: aip-developer-intro_pag2015

araport.org

Page 19: aip-developer-intro_pag2015

araport.org

App Security

• Apps deployed to AIP are sandboxed

– Only the user creating the app can access/use

– Publication workflow for AIP staff to code review and functionality review before making public

• App code is partitioned

– Kept separate from the rest of AIP Portal code

– Only executes in user’s browser, not on server

• App artifact hosting is limited

• App on AIP have an open-source

Page 20: aip-developer-intro_pag2015

araport.org

Apps Workspace

• Drupal module

• Apps upload/ingest from public git

repositories

• User-created “workspaces”

• Private, shared*, public apps

Page 21: aip-developer-intro_pag2015

araport.org

Apps Workspace (2)

Page 22: aip-developer-intro_pag2015

araport.org

Page 23: aip-developer-intro_pag2015

araport.org

Page 24: aip-developer-intro_pag2015

araport.org

Apps Examples

• Query app (ATTED-II)

• Visualization app (EBI Interaction Viewer)

• Computational app (BLAST)

• Other types (Notebook)

Page 25: aip-developer-intro_pag2015

araport.org

Page 26: aip-developer-intro_pag2015

araport.org

Page 27: aip-developer-intro_pag2015

araport.org

Page 28: aip-developer-intro_pag2015

araport.org

Page 29: aip-developer-intro_pag2015

araport.org

Page 30: aip-developer-intro_pag2015

araport.org

Developer Support

Online Tutorial Topic Link

Getting started http://bit.ly/aip-get-started

Technical overview http://bit.ly/aip-overview

Your first AIP app http://bit.ly/aip-first-app

Araport APIs and authentication http://bit.ly/aip-agave-auth

Creating a data-driven

application

http://bit.ly/aip-build-app

Deploying your app to Araport http://bit.ly/aip-deploy

Creating web services for Araport http://bit.ly/aip-websvcs

Linking to Araport content http://bit.ly/aip-link

• Bookmark araport.org/devzone

• Follow @araport on Twitter

• Join araport-developers Google Group

• Follow Arabidopsis-Information-Portal GitHub

Page 31: aip-developer-intro_pag2015

araport.org

Chris Town, PI

Lisa McDonald

Education and

Outreach

Coordinator

Chris Nelson

Project

Manager

Jason Miller, Co-PI

JCVI Technical Lead

Erik Ferlanti

Software Engineer

Vivek Krishnakumar

Bioinf. Engineer

Svetlana Karamycheva

Bioinf Engineer

Eva Huala

Project lead, TAIR

Bob Muller

Technical lead, TAIR

Gos Micklem, co-PI Sergio Contrino

Software Engineer

Matt Vaughn

co-PI

Steve Mock

Portal Engineer

Rion Dooley,

API Engineer

Matt Hanlon,

Portal Engineer

Maria Kim

Bioinf Engineer

Ben Rosen

Bioinf

Analyst

Joe Stubbs,

API

Engineer

Walter Moreira,

API Engineer

Page 32: aip-developer-intro_pag2015

araport.org

Page 33: aip-developer-intro_pag2015

araport.org

Araport Service Architecture

RESTful API @ https://api.araport.org/

CLI clients,

Scripts, 3rd party

applications

Physical

resources

Agave Core

apps

meta

files

profile

jobssystems

ADAMAmanage

enroll

a b c d e f

AIP + 3rd party data

providers

API Types

• Query

• Map*

• generic

• pass-through

• Single-sign on

• Throttling

• Unified logging

• API versioning

• Automatic

HTTPS

REST*

CGI

SOAPNew

Web

Services

InterMin

e

Chado &

Tripal

Computing

StorageDatabase

Page 34: aip-developer-intro_pag2015

araport.org

ADAMA Road Map

• Automatic live documentation including params

• Parameter validation at query time

• Response validation via JSON schema

• Automated provenance and attribution

• Language support (Java, Javascript, Perl)

• Full command line interface

• Status monitoring and notification

• Better “Data API Store”

• Per-namespace and-service Access Control Lists

Page 35: aip-developer-intro_pag2015

araport.org

Community Engagement

• Existing APIs + source turned over to the community for additional development

• Community request for comment (RFC)– Parameter metadata

– JSON Response schemas

– Provenance and attribution features

• Developing documentation, examples and tutorial material– Complete the entire API publication and usage lifecycle

without direct AIP intervention or personal support

• Assisting community in their development efforts

Page 36: aip-developer-intro_pag2015

araport.org

Code Examples

• https://github.com/Arabidopsis-Information-Portal/jcvi-rtpcr-demos

• https://github.com/Arabidopsis-Information-

Portal/aip_thalemine_webservices

• https://github.com/Arabidopsis-Information-Portal/atted_webservices

• https://github.com/Arabidopsis-Information-Portal/bar_webservices_demos

In addition to our tutorial code, these are good, illustrative examples of ADAMA

web services.

Page 37: aip-developer-intro_pag2015

araport.org

ADAMA: Araport DAta Mediator API

AGAVE

API MANAGER

NoSQL intermediary

Endpoint

https://api.araport.org/community/v0.3/

Live Docs

https://adama-dev.tacc.utexas.edu/api/adama.html

Page 38: aip-developer-intro_pag2015

araport.org

API Manager + Enterprise Service Bus

Araport architecture (2)

Secure, rationalized REST services

Consumer Applications

Simple

Proxy

ThaleMine, Data

integration, other

services

Cache

XML-to-

JSON

SOAP-to-

REST

CGI-to-

REST

Throttle

Legacy

API A

Legacy

API B

REST

API C

Simple

Proxy

• Single-sign on

• Throttling

• Unified

logging

• API versioning

• Mediation and

translation

• Dev-friendly

interfaces

• Rationalized

REST for

consumer

apps

Media

tors

Page 39: aip-developer-intro_pag2015

araport.org

Science Objectives

• Make more, varied data available to the Arabidopsis (and other) communities within a unified user experience

• Enhance the innate value of data by offering enhanced search, retrieval, and display capabilities

• Facilitate analysis of user data

• Enable community participation in functional annotation

Page 40: aip-developer-intro_pag2015

araport.org

Technical Objectives

• Deploy a responsive, flexible community-

extensible system

• Provide APIs everywhere!

• Promote and facilitate data integration

• Enable language- and region-specific

presentation of scientific content

• Meet mobile computing on its own terms

Page 41: aip-developer-intro_pag2015

araport.org

Local vs. Data-driven Apps

Resources are local and

inherently offline. Operating

on local data using local

computing.

Resources are cloud-based and

inherently online. Multiple data

streams integrated, queried,

presented in context of broader

objective.

Photoshop Express KAYAK Pro

Page 42: aip-developer-intro_pag2015

araport.org

Araport Bill of Materials

• Araport is currently built using– Drupal 7.25

• Developer-oriented content management system

– Bootstrap.js and some other Javascript toolkits

– InterMine (with modifications)

– Bioinformatics infrastructure + misc. other bits

– Agave 2.0 Software as a Service platform• Developed by iPlant Collaborative project

• Bulk data, metadata, authentication, HPC app and job management, notifications & events, and more

• OAuth2 out of the box

• Enterprise service bus (ESB) architecture

• http://agaveapi.co/

Page 43: aip-developer-intro_pag2015

araport.org

Agave wso2 interface

Cache (Technology TBD)

CSV

Araport APIM Architecture (1)

POLYMORPH CGI

Form

Input Key

Map

Output

Key Map

Input

Transform

Output

Transform

Listen Respond

Send Listen

Input Key

Map

Output

Key Map

Input

Transform

Output

Transform

Listen Respond

Send Listen

Araport API

Manager

JSON Query JSON Response

ElasticSearch

Remote Services

SNP by Locus REST Indel by Position REST Enroll Manage

Page 44: aip-developer-intro_pag2015

araport.org

Araport Architecture: Use Cases (1)

• 1001 Genomes POLYMORPH tools

– Provides variation data via locus or positional

search

– Total of seven variant types available for search

– Search parameterization depends a lot on variant

type

– Example of a plain-text CGI service

– Returns results as CSV with named columns

• Objective: Transform into a RESTful API that

expects and returns rationalized JSONhttp://polymorph.weigelworld.org

Page 45: aip-developer-intro_pag2015

araport.org

Araport Architecture: Use Cases (2)

• ThaleMine

– Has native REST interface for general queries

– Has templates which can form basis of specific

services

• Objective: Offer both Intermine-native and

AIP-conformant interfaces as Data APIs

• Current path

– Enroll native services in our APIM

– Develop template-based AIP-conformant services

http://polymorph.weigelworld.org

Page 46: aip-developer-intro_pag2015

araport.org

Data APIs: Getting Started

Service Queries Notes

BAR eFP Locus

BAR Expressologs Locus

BAR Interactions Locus

COGe Position Special case – output transform only

NASC $SERVICE LocusSOAP based but may be offline

permanently

OrthologFinder Locus Based on a Thalemine template

POLYMORPH Locus, Position Actually seven CGI services

SUBA3 Locus

Compiling example queries, parameter mapping and description, and ideal

results for use in implementing the system

Page 47: aip-developer-intro_pag2015

araport.org

Developing a Data API

• In order, we prefer that you have ready• Well-documented REST

• Moderately well-documented REST

• SOAP services (plus WSDL or WADL)

• Plain Old XML

• Plaintext CGI

• HTML CGI

• No web services at all

• Work with us to enroll your services as a data source. This will involve a minor amount of coding.

Page 48: aip-developer-intro_pag2015

araport.org

Computational App Model (1)

Host file

systems

Host OS

Docker.io

Centos

6.4

custom-

repo

Container/scratch

/database

Host FS (250 GB)

TACC Corral (PB+)

sftp

Agave apps, data, jobs

REST API x JSON objects

Page 49: aip-developer-intro_pag2015

araport.org

Science Apps: Grid View

• Current Scheme

• 2-3 column view w

draggable apps

• Apps are normal, full-

size, or collapsed

• Single app screen

• Later in 2014

• N x X grid scheme

implementing resizable

app “tiles” like one sees

in Android or Win8.x

• App SDK libraries will

have “help” for enabling

resizable design

• Multiple app screens

Page 50: aip-developer-intro_pag2015

araport.org

Data API Details (2)

• For service-specific parameters– Provide human-readable names mapped to original

parameter names

– Offer minimal descriptive text

– Specify validation• Cardinality

• Pattern validator (regex)

• Type (number, string, etc.)

– Indicate whether required

– Indicate whether they should be visible in a UI

– Specify reasonable default values

• Seems familiar?– This approach is used to to abstract command line apps

– Allows automatic generation of minimally functional UI

Page 51: aip-developer-intro_pag2015

araport.org

Data APIs: Response types (1)

• locus_relationship – pairwise relationship between A and B

– Directionality

– Type

– Array of scores (weights, etc.)

• sequence_feature – positional attribute

– Extension of GFF model plus

– Build

– Attributes array

Page 52: aip-developer-intro_pag2015

araport.org

Data APIs: Response types (2)

• locus_feature – key-value attributes per locus– Optional controlled vocabulary* for keys

– Support for both slots and arrays

• raw – for returning images or other binary formats– Source and other metadata carried in X-headers instead of

JSON result

– Outbound transformation still supported

– Not a preferred response mode

• text – returning either native service response or a non-conformant JSON document– Source and other metadata carried in X-headers instead of

JSON result

– Not a preferred response mode

Page 53: aip-developer-intro_pag2015

araport.org

Data API Details (6)

• Transparent caching will compensate for

transient remote service failures

• Automatic indexing of certain response

types via ElasticSearch, allowing for

sophisticated global search

– ElasticSearch allows us to index everything

we “know about” and return it quickly

– iPlant uses it to live-index >700 TB user data

Page 54: aip-developer-intro_pag2015

araport.org

Developing an app

• Understand and document the user stories you’re addressing with your app

• Identify all requisite data sources AND

• Help us prepare them as Data APIs– This may involve coding

• Understand the data integration or aggregation needs of your app– This may involve coding

• Develop the user interface(s) for your app using our tool kits and suggested practices– This will involve coding.

– But you will learn tools like jQuery, Bootstrap, & D3 and will thus be eminently employable!