data science challenges in personal program analysis

19
Data Science Challenges in Personal Program Analysis Bas van Schaik New York R Conference (April 2016)

Upload: work-bench

Post on 14-Apr-2017

8.069 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Data Science Challenges in Personal Program Analysis

Data Science Challenges inPersonal Program Analysis

Bas van SchaikNew York R Conference (April 2016)

Page 2: Data Science Challenges in Personal Program Analysis

- Cloud service for personal program analysis

- Free for OSS projects

- Currently in private beta, release imminent

Page 3: Data Science Challenges in Personal Program Analysis

Personal Program Analysis: why?

We are passionate about code.

We wish everyone would write better code.

We help people build better software better.

Page 4: Data Science Challenges in Personal Program Analysis

Ehm… Program analysis?

Compiler

Page 5: Data Science Challenges in Personal Program Analysis

What’s an ‘Alert’?

Short answer: a bug or a violation of good coding practice

Example: define the same key twice in a Python dict

E.g. in OpenStack Designate:

self.target = objects.PoolTarget.from_dict({ 'type': 'powerdns', 'options': [{ 'key': 'connection', 'value': 'memory://', 'key': 'host', 'value': '127.0.0.1', 'key': 'port', 'value': 53}],})

My guess of what was intended:

self.target = objects.PoolTarget.from_dict({ 'type': 'powerdns', 'options': [ {'key': 'connection', 'value': 'memory://'}, {'key': 'host', 'value': '127.0.0.1'}, {'key': 'port', 'value': 53}],})

Page 6: Data Science Challenges in Personal Program Analysis

What’s an ‘Alert’?

Alerts are found by queries: ● The source code is our database● Every query result is an alert.

Support for 10 different programming languages (and counting), a total > 1000 queries and metrics.

Page 7: Data Science Challenges in Personal Program Analysis

What does a query look like?

from Method mwhere m.hasName("hashcode") and m.hasNoParameters() select m, "Should this method be called 'hashCode' rather than 'hashcode'?"

Page 8: Data Science Challenges in Personal Program Analysis

Making it interesting: project over timenet alerts

activ

ityco

mpo

sitio

nne

t LO

C

OpenStack Nova (python)

Page 9: Data Science Challenges in Personal Program Analysis

Or: compare different projectsCinder

Nova

Neutron

Horizon

Heat

SwiftSahara

Glance

Designate

Keystone

FuelIronic

aler

ts

LOC

Page 10: Data Science Challenges in Personal Program Analysis

Even more interesting: make it personal

A

X

net LOC contributed (all OpenStack modules)

net

aler

ts

B

Page 11: Data Science Challenges in Personal Program Analysis

Data Science for PPA: finding fun facts

Trailblazer

Bug squasher

Refactorer

None

Major release

Tota

l con

trib

utor

s%

con

trib

utor

s

Who's doing what in OpenStack?

Page 12: Data Science Challenges in Personal Program Analysis

Data science for PPA: cleaningPostgreSQL (net churn and net alerts - before cleaning)

PostgreSQL: after cleaning

Page 13: Data Science Challenges in Personal Program Analysis

Warning:

DEMO of beta software

Page 14: Data Science Challenges in Personal Program Analysis

But… why make it personal?

Some developers not so happy:

“are you questioning my ability to write code?”

No. We're helping you to improve.

Page 15: Data Science Challenges in Personal Program Analysis

But… why make it personal?

By making it personal, we make people care.

When people care, they improve.

When developers improve, the code improves.

Page 16: Data Science Challenges in Personal Program Analysis

But… why make it personal?

When developers improve, the code improves.

● Automated code review on GitHub pull requests

● “On 12/11/2015 you introduced X, fancy fixing that?”

● “You recently fixed alert A in file B. Based on your expertise, you might also be interested in fixing alert X in file Y?”

● “Compared to developers like you, you rank 20 out of 100”

● “… and by fixing these 5 alerts, you'll be in the top 10!”

● Found a bug in your project? Write a query for it, share it!

Page 17: Data Science Challenges in Personal Program Analysis

Not rocket science… Or is it?

Page 18: Data Science Challenges in Personal Program Analysis

DEMO (continued)

Page 19: Data Science Challenges in Personal Program Analysis

Interested in…

Early access to CodingStars?

Having your OSS project analysed?

Working for us in New York, San Francisco, Oxford (UK), or Copenhagen (Denmark)?

Talk to us!(in person, or [email protected])