owf14 - plenary session : ori pekelman, founder, constellation matrix

22
Ori Pekelman @ Open World Forum 2014 Combining Big Data & Open Source strategy

Upload: open-world-forum

Post on 20-Jun-2015

182 views

Category:

Data & Analytics


0 download

DESCRIPTION

Open World Forum 2014

TRANSCRIPT

Page 1: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Ori Pekelman @ Open World Forum 2014

Combining Big Data & Open Source strategy

Page 2: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Me

I am an entrepreneur and a consultant check out http://platform.sh on which I have been working a lot on lately

I am the originator and co-organizer of a bunch of meetups such as the Functional Languages User Group (btw happenning right now…) and the big informal Data group we call ParisDataGeeks (with people like Olivier Grisel and Sam Bessalah.. And btw this one happens all day tomorrow!)

On Social Media I am @OriPekelman

OWF 2014 2

Page 3: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Big Data Small Talk

This is a short talk. There won’t be anything overly technical here.

I don’t remember how this got to be the title of the talk..

If you come tomorrow you will get an incredible birds-eye view of current trends in real time big machine learningy data applications

OWF 2014 3

Page 4: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Data this Data that

Data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data … data

Everybody loves the data.

OWF 2014 4

Page 5: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Data this Data that

OWF 2014 5

Page 6: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Data this Data that

Well, are contractions are so hard that with 100 petabytes we can’t do some simple Markov chains in the 24th century ?

We say “Big Data” so often these days it has become an extremely vague term.

And when we say “Open Data” we get the same form of vagueness. Let’s try to frame this.

Page 7: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

What applications of big data are we talking about here?

The machine learning kind: Everything else is mostly trivial or just a bit of engineering away.

When we say Machine Learning it basically means:

Rediculous amount of data

100% proprietary mostly about

intimate human interactions

Software

Mostly Open Source

Model

Mostly Opaque. Mostly Closed. Some with APIs

Robust Predictions

Mostly about human behaviour

Page 8: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

The ingredients

Data sources Proprietary and closed (property of whom?)

Proprietary with some APIs

Open with an open license

Software Proprietary

Free

Stuff to run the software on Proprietary

Models Proprietary and closed

Proprietary with some APIs

?

OWF 2014 8

Page 9: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Data property

Us as individuals to a very faint degree

Governements

Google

Apple

Credit Card Companies and banks

People we haven’t heard about

OWF 2014 9

Page 10: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

On the software ingredient

Big Data is predominantly an Open Source game

How much big data software is not prefixed by « Apache »?

OWF 2014 10

Page 11: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Laws of data. I like laws.

« Data expands to fill the space available for storage. »

Parkinson’s law applied to data

« Free disk space is always pronounced in percentage, and the percentage is always a single digit »

My father

OWF 2014 11

Page 12: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Parkinson’s law

Cloud technologies represent an ultimate phase in the commoditization of computing storage and calculation power

Becomes limited only by cost (well at least in theory).

So if we take Parkinson’s law to the letter data will expand until we have spent humanity’s last dime.

OWF 2014 12

Page 13: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

The Cloud, Data and Free Software

The cloud is orthogonal at the least to the basic idea of free software (the libre variety)

Because what makes free software economically possible is that the marginal cost of duplicating code tends to zero.

The marginal cost of duplicating data grows at best linearly and because of Parkinson's law.. Probably more than that.

This means that in the list of ingredients we noted before “data” will by nature be mostly proprietary. Because its cost is directly linked to that of machines and because Moore’s law is of no help.

Page 14: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Models

Models are better than data They are less sparse , more dense

They are data reduced

They always give an answer

They are immediately useful

Its like the thing with Data->Information->Knowledge + (Wisdom?)

As we noted before the models we are talking about are mostly Opaque, they do not generate Wisdom.

OWF 2014 14

Page 15: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Laws of data. I like laws.

«Information wants to be free »

Stewart Brand

Well this one is less of a law in the sens of a physical one, and more of a moral one. We will get back to this at the end.

OWF 2014 15

Page 16: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Laws..

“Hybrid data makes all your data big”

I think that's me.. But you know, zeitgeist

Hybrid data denotes “Data Applications” where the data comes from your own internal data sources and either open or proprietary external sources.

Often enough mixing data sources has a combinatorial effect. Data locality become really important.

Using Predictive APIs means building a Hybrid Data application where you only have access to the resulting model.

Page 17: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Watson in the mix

ML requires data. The bigger it gets the more robust you will be.

Open Source mostly commoditizes the algorithmic and software layer, not a lot of secret source there.

Players with the most data will probably be able to build more robust models

And as basically all “Data Applications” will be Hybrid ones, we will see more and more applications dependent on external derived, opaque, models

Page 18: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Predictive APIs

The "As A service" crowd is becoming the more potent rival to Free software

While most of them will run Open Source solutions in any case

Most of the value will remain proprietary and these robust models are going to be at least as important as the software

As a company, blindly going into this means you might very well find yourself extremly dependent on others for some of your core operations

Free software alone will not defend you

Page 19: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

2014 this is happenning already

OWF 2014 19

Page 20: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

It’s a social issue too

There is a strong ethical reason we want to fight not only for open source but also for open data

The advent of opaque systems with smart algorithms and an extreme amount of data on us (the proprietary data + as a service model) is not only going to be bad for our privacy, its going to have tangible effects on our livelihoods, on our place is society as it can introduce an extreme form of information asymmetry at a scale not seen before.

In this domain more then in others the actors of Free Software need to be more vigilant and by working with the other actors of freedom make sure we are not constructing the tools of our demise.

Page 21: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Information wants to be free

Well, if you are stuck in the 2000s and do nightly batches you are probably not managing well your own internal data wealth. So get on it.

Learn about what we can currently do in Machine Learning. Start having a plan.

Don’t hoard the data. Open it at least to some extent.

Collaborate on the economical and social framework for open data and open models.

Either because you are a government and you have a moral obligation to defend your citizens.

Or because if you become a consumer only you will not be able to manage your dependency on external opaque sources.

OWF 2014 21

Page 22: OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

#ParisDataGeeks

… and come tomorrow starting at 9am for talks such as: Algebird : algebra for efficient big data processing Abstract algebra for data mining par

Sam Bessalah (Software Engineer, Independant) Context Awareness : From NEST to Google Now and IFTTT, in this talk we will go

through some of the most successful use cases of context awareness, and explain some of the technology behind the pocket brain we are currently building at Snips. par Dr. Rand Hindi

Apache Kafka distributed publish-subscribe messaging system Par Charly Clairmont (CTO, Altic)

Data encoding and Metadata for Streams Par Jonathan Winandy (Founder, Primatice) Next Open Source Big Data Suite A new low level approach for BigData Par Emmanuel

Keller (CEO/CTO, OpenSearchServer) State Of the Art in Machine Learning Par Olivier Grisel (Software Engineer, Inria) Take back control of your web tracking Go further by doing it yourself par Clément

Stenac (CTO, Dataiku) Real time energy data analysis with Apache Storm par Simon Maby (Software Architect,

Octo Technology)

OWF 2014 22