scripting without scripts: a user-friendly integration of ......scripting without scripts: a...

Post on 29-May-2020

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scripting without Scripts: A User-Friendly Integration of R, Python,

Matlab and Groovy into KNIMEFelix Meyenhofer

Technology Development Studio

3. March 20114th KNIME Users Group Meeting and Workshop

1Friday, March 4, 2011

Outline

• Motivation

• Architecture and design

• Scripting languages

• Generic R

• Prototyping

• Perspectives

2

2Friday, March 4, 2011

Outline

2

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

2Friday, March 4, 2011

Our situation

3

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

3Friday, March 4, 2011

The actors

4

I’m a data analyst.I can script in R, Python, Matlab and a little Java .I hate user interfaces!

I’m a scientist.I use Excel because I rely on a proper UI, but it does not scale, does not work well for all types of experiments, etc. .So I often need to go to see my data analyst

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

4Friday, March 4, 2011

Are data analysts lazy people?

A (typical) research project

• Experiment

• Lots of data

• Fancy data mining scripts

• A little Excel

• Great results

• Paper

5

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

5Friday, March 4, 2011

Are data analysts lazy people?

A (typical) research project

• Experiment

• Lots of data

• Fancy data mining scripts

• A little Excel

• Great results

• Paper

5

Answer:No, but staff-ratio = 8:1

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

5Friday, March 4, 2011

Can fix the problem?

+ It is able to handle big data sets+ Has ready to use data interfaces (txt, xls, sql...)+ It’s modular architecture allows customization

- Native KNIME visualisation are insufficient (for our purpose)- It lacks specific methods and tool- node developement requires java-geeks and takes time, where the data analysis requirements evolve quickly along with the experiments

In contrast:

• Scripting solutions can be quickly developed but are not end-user ready

• Scripting allows rapid prototyping of methods and tools

6

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

6Friday, March 4, 2011

Idea

What if data analysts could write their scripts as usual, AND users could somehow access these within a well designed framework using a graphical interface?

7

=+

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

7Friday, March 4, 2011

The basic components

8

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Server+

API

Module

KNIME

Java

Scripting Framework

8Friday, March 4, 2011

9

Script template system

7

XML template

User interface Script

Visne et al., RGG: A general GUI Framework for R scripts, 2009, Bioinformatics, doi:10.1186/1471-2105-10-74

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

9Friday, March 4, 2011

Scripting node state diagram

X

Unlink

Script

ConfigureTemplate

Execute

Edit Template

Reattach/Convert to template

Empty Script Scripting

Notrgg

Create Node

CreateUI Discards all script

customizations

on execute

RebuildUI

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

10Friday, March 4, 2011

Reduction to the necessairy

10

internal internal

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

11Friday, March 4, 2011

Template management

• Local or remote template repositorie• Plain-text template definition files• Hierarchically organized• Previews for visualization templates

11

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

12Friday, March 4, 2011

The (new) situation

• Facilitated knowledge propagation within and among research groups

• Facilitated knowledge (work) preservation with the template system

• Applicable form different levels of expertise

12

Users• Use Knime and templates• Be happy!

Advanced Users• Customize templates in-place • Use flow variables and loops

Data Analysts• Write scripts to fill Knime-gaps• Evolve scripts into new templates

Developers• Provide Knime-nodes for most popular templates

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

13Friday, March 4, 2011

The (new) situation

• Facilitated knowledge propagation within and among research groups

• Facilitated knowledge (work) preservation with the template system

• Applicable form different levels of expertise

12

Users• Use Knime and templates• Be happy!

Advanced Users• Customize templates in-place • Use flow variables and loops

Data Analysts• Write scripts to fill Knime-gaps• Evolve scripts into new templates

Developers• Provide Knime-nodes for most popular templates

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

13Friday, March 4, 2011

Scripting nodes vs. conventional nodes

• Less integrated than native nodes

• ‘Real’ nodes are likely to be (much) faster

• ‘Real’ nodes scale better/work with huge data-sets

• ‘Real’ nodes can be updated by updating Knime

• ‘Real’ node-development requires 10x more resources

13

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

14Friday, March 4, 2011

From templates to nodes

• Templates become updateable

• Rapid R prototyping

• Trivial deployment (1node/min) into end-user ready nodes

14

+ = Node

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

15Friday, March 4, 2011

From templates to nodes

• Templates become updateable

• Rapid R prototyping

• Trivial deployment (1node/min) into end-user ready nodes

14

+ = Node

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

15Friday, March 4, 2011

Scripting language support

15

R

Rserve as backend

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

16Friday, March 4, 2011

Scripting language support

15

R MATLAB

Rserve as backend

mpicbg-matlab web-client(needs a floating license!)

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

16Friday, March 4, 2011

Scripting language support

15

R MATLAB Python

Rserve as backend

mpicbg-matlab web-client(needs a floating license!)

mpicbg-pythonbackend

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

16Friday, March 4, 2011

Scripting language support

15

R MATLAB Python Groovy

Rserve as backend

mpicbg-matlab web-client(needs a floating license!)

mpicbg-pythonbackend

Using Knime API but in a dynamically complied

environment

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

16Friday, March 4, 2011

Scripting language support

15

R MATLAB Python Groovy

Rserve as backend

mpicbg-matlab web-client(needs a floating license!)

mpicbg-pythonbackend

Using Knime API but in a dynamically complied

environment

server infrastructure makes it easier for maintenance.(power users can use a local instance)

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

16Friday, March 4, 2011

Scripting language support

15

R MATLAB Python Groovy

Rserve as backend

mpicbg-matlab web-client(needs a floating license!)

mpicbg-pythonbackend

Using Knime API but in a dynamically complied

environment

What else?

All that is necessary is (1) a table conversion mechanism, (2) the possibility to invoke a script with the converted table as argument, and (3) a way to convert the results back into a table

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

16Friday, March 4, 2011

Feature overview

16

R MATLAB Python Groovy

• STATISTICS

• Visualization

• Handy data-types

Exponential growth in packages

• Statistics

• IMAGE PROCESSING

• Visualization

• Easy Java interface

• Documentation

Many people know it (#29) *

• BIOINFORMATICS

• Statistics

• (Image Processing)

• Visualization (Matplotlib)

#4 progamming language *

• Java prototyping

• regexp

• ...

memory management

rather slow without grid computing toolbox

interface syntax (just joking)

feat

ures

)-:

* http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

17Friday, March 4, 2011

Generic R

• Knime is focused on table transformations

• Some problems do not fit into this scheme

• Solution: Generic R nodes

17

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

18Friday, March 4, 2011

The MATLAB integration trio

18

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

19Friday, March 4, 2011

The MATLAB integration trio

18

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

19Friday, March 4, 2011

The MATLAB integration trio

18

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

19Friday, March 4, 2011

The MATLAB integration trio

18

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

19Friday, March 4, 2011

The MATLAB integration trio

18

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

19Friday, March 4, 2011

The MATLAB integration trio

18

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

19Friday, March 4, 2011

The R integration trio

19

20Friday, March 4, 2011

The R integration trio

19

20Friday, March 4, 2011

The R integration trio

19

20Friday, March 4, 2011

The R integration trio

19

20Friday, March 4, 2011

The R integration trio

19

20Friday, March 4, 2011

The R integration trio

19

20Friday, March 4, 2011

The Python integration trio

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

21Friday, March 4, 2011

The Python integration trio

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

20

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

21Friday, March 4, 2011

21

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

22Friday, March 4, 2011

21

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

22Friday, March 4, 2011

21

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

22Friday, March 4, 2011

21

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

22Friday, March 4, 2011

21

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

22Friday, March 4, 2011

21

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

Creating a Python plot template

22Friday, March 4, 2011

Question to the audience:

Is this KNIME or R?

22

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

23Friday, March 4, 2011

Question to the audience:

Is this KNIME or R?

22

95%

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

23Friday, March 4, 2011

Our experience so far:

23

I love templates, because they can be used to quickly make custom solutions accessible for my scientists.

I love templates, because they provide me the exact tools I need. And my data analyst can create them almost instantaneously.

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

24Friday, March 4, 2011

Aggregation

Additional features of our scripting integrations:

• Better editor (undo, redo)

• Better attribute name insertion

• Preserve column names

• UI templates + centralized user template repositories

• Worker engine can be run locally or on a centralized server

• Dynamic repainting of plots

• OpenIn* nodes for rapid prototyping

• Faster table conversion (~2x for R)

• Flow-variable support in scripts

• Generic R nodes to process arbitrary data structures

Additional scripting languages:

• Python

• Matlab

24

MotivationArchitecture and design

Scripting languagesGeneric R

PrototypingPerspectives

25Friday, March 4, 2011

Acknowledgements

Holger Brandl (Software developer for Bioinformatics)

Tom Haux (Software developer SWENG)

Martin Stöter (Scientist TDS)

Michael Berthold and the entire KNIME team

25

26Friday, March 4, 2011

top related