apache madlib (incubating)madlib.incubator.apache.org/community-artifacts/apache...apache madlib...

17
1 Apache MADlib (Incubating) Oct 2016 User Survey Results

Upload: others

Post on 28-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

1

Apache MADlib (Incubating)

Oct 2016

User Survey Results

2

Received ~40 responses from 27 different companies

3

Summary (1) • ~50% of respondents have 1 year or less of

MADlib use• Fraud detection is the most common use case• Regression (various), clustering and random

forest are the most commonly used MADlib algorithms

• Gradient boosting is the most commonly requested new algorithm

4

Summary (2) • Users prefer new algorithms more than

improvements to existing algorithms by a 2:1 margin

• Improved documentation/examples and better performance are the biggest concerns

• The most common other tools used by respondents are R, Spark and Python (and associated libraries)

5

Q1

6

Q2

7

Q3

8

Q4 - Top Use Cases

9

Q4 - Other Use Cases

10

Q4 - Use Cases

Stemmed, stop words removed

11

Q5 - Frequently Used Algorithms

12

Q6 - Top Requested Features

*Note that there is an R interface called PivotalRhttps://cran.r-project.org/web/packages/PivotalR/

*

13

Q6 - Other Requested Features

*

14

Q6 - Requested Features

All responses, stemmed, stop words removed

15

Q7 - Main Concerns

16

Q7 - Main Concerns

All responses, stemmed, stop words removed

17

Q8 - Other Tools Used

+Several others...