open dataopensource conf may2015

49

Upload: crowdflower

Post on 28-Jul-2015

281 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Why We Need More Data

3

Lots of Data

4 CrowdFlower, Inc. – Proprietary and Confidential

The Effect of Better Algorithms

Naïve Bayes Maximum Entropy SVM0%

5%

10%

15%

20%

25%

Classifier Error Rate

5

Real World Data

Active Semi-Supervised Learning for Improving Word Alignment

(Vamshi ACL ’10)

6 CrowdFlower, Inc. – Proprietary and Confidential

The Effect of Better Features

Unigrams Bigrams Unigrams+Bigrams0%

5%

10%

15%

20%

25%

30%

Classifier Error Rate

7

Real World Data

8 CrowdFlower, Inc. – Proprietary and Confidential

The Effect of More Data

N 2N 4N0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

9

Real World Data

Active Semi-Supervised Learning for Improving Word Alignment

(Vamshi ACL ’10)

10 CrowdFlower, Inc. – Proprietary and Confidential

The Effect of Cleaner Data

90% Accurate Data 95% Accurate Data 100% Accurate Data0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

11

Where do Data Scientists Spend Their Time

The Power of Open Data

13

CrowdFlower Data Enrichment Platform

14

Color Data

15

16

17

18

19

20

21

Fleshmap

22

23

Drug Side Effects

24

25

26

Apple Watch

27

Apple Watch

28

Apple Watch

29

Apple Watch

Data for Everyone

31

Collecting the Same Data Over and Over

32

Open Data

33

Make Your Data Public Setting

34

Data for Everyone

35

Data For Everyone Library

36

Data for Everyone

37

Data For Everyone

38

Categorize URLs

39

URL Categorization

40

Open Data API

41

Record Data

42

Extracting Names and Titles

43

Summarization

44

Is an Image Funny?

45

Classifying Medical Images

46

Attributes of People

47

48

396 Scripts