state of spark, and where it is going · meetup members 2014 2015 developers contributing 3900 1100...

18
State of Spark, and where it is going Reynold Xin @rxin Strata Singapore Dec 3 rd , 2015

Upload: others

Post on 12-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

State of Spark, and where it is going

Reynold Xin @rxinStrata SingaporeDec 3rd, 2015

Page 2: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

SQL Streaming MLlib

Spark Core (RDD)

GraphX

Spark stack diagram

Page 3: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

A Great Year for Spark

Most active open source project in big data

New language: R

Widespread industry support & adoption

Page 4: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Community Growth

2014 2015

Summit Attendees

2014 2015

MeetupMembers

2014 2015

Developers Contributing

3900

1100

50K

12K

500

1000

Page 5: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Meetup Groups: December 2014

source: meetup.com

Page 6: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Meetup Groups: December 2015

source: meetup.com

Page 7: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao
Page 8: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Users

1000+ companies

Distributors + Apps

50+ companies

Page 9: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Diverse Runtime EnvironmentsHOW RESPONDENTS ARE

RUNNING SPARK

51%on a public cloud

MOST COMMON SPARK DEPLOYMENTENVIRONMENTS (CLUSTER MANAGERS)

48% 40% 11%Standalone mode YARN Mesos

Cluster Managers

Page 10: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Industries Using Spark

Other

Software(SaaS, Web, Mobile)

Consulting (IT)Retail,

e-Commerce

Advertising,Marketing, PR

Banking, Finance

Health, Medical,Pharmacy, Biotech

Carriers,Telecommunications

Education

Computers, Hardware

29.4%

17.7%

14.0%

9.6%

6.7%

6.5%

4.4%

4.4%

3.9%

3.5%

Page 11: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Top Applications

29%

36%

40%

44%

52%

68%

Faud Detection / Security

User-Facing Services

Log Processing

Recommendation

Data Warehousing

Business Intelligence

Page 12: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Largest Cluster & Daily Intake

12

800 million+active users

8000+nodes

150 PB+1 PB+/day

Page 13: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Alibaba Taobao

13

clustering(community detection)

belief propagation(influence & credibility)

collaborative filtering(recommendation)

* Spark Summit San Francisco 2014

Page 14: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Possible Assets

Targeted Marketing

Financial Networking

Huawei FusionInsight Spark

……

DB/DW

Credit proof:about 2 Weeks

Credit Proof2~5 Seconds

Off LineHistory Query

On LineHistory query

Structured Data Structured, Semi-Structured, Unstructured Data

Higher

History Query 7 years+1 year

Micro- loan Conversion Rate 40X

Credit Proof 2-5s15days

Top Retail Bank Huawei

Top Retail Bank & Huawei

Page 15: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Are We Done?

No! Development is faster than ever. Expect Spark 2.0 in 2016.

Biggest technical change in 2015 was DataFrames• Moves many computations onto the relational Spark SQL optimizer

Enables both new APIs and more optimization, which is now happening through Project Tungsten

Page 16: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Coming in Spark 1.6

Dataset API: typed interface over DataFrames / Tungsten• Common ask from developers who saw DataFrames

case class Person(name: String, age: Int)

val dataframe = read.json(“people.json”)val ds: Dataset[Person] = dataframe.as[Person]

ds.filter(p => p.name.startsWith(“M”)).groupBy(“name”).avg(“age”)

Page 17: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Other Upcoming Features

DataFrame integration with GraphX and Streaming

More Tungsten features: faster in-memory cache, SSD storage, better code generation

Data sources for Streaming

Page 18: State of Spark, and where it is going · Meetup Members 2014 2015 Developers Contributing 3900 1100 50K 12K 500 1000. MeetupGroups: December 2014 source: meetup.com. ... Alibaba Taobao

Thank you.@rxin