spark streaming

25
www.edureka.co/apache-spark-scala-training Spark Streaming Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Upload: edureka

Post on 02-Aug-2015

107 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Spark Streaming

www.edureka.co/apache-spark-scala-training

Spark Streaming

Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Page 2: Spark Streaming

Slide 2Slide 2 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

What is Big Data?

What is Spark?

Why Spark?

Spark Ecosystem

Spark Features

Scala overview

Spark Streaming Demo

For Queries during the session and class recording:Post on Twitter @edurekaIN: #askEdurekaPost on Facebook /edurekaIN

Objectives of this Session

Page 3: Spark Streaming

Slide 3Slide 3 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

What is Big Data

Page 4: Spark Streaming

Slide 4Slide 4 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

What is Big Data

Page 5: Spark Streaming

Slide 5Slide 5 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Big Data

Lots of Data (Terabytes or Petabytes)

Big data is the term for a collection of data setsso large and complex that it becomes difficult toprocess using on-hand database managementtools or traditional data processing applications

The challenges include capture, curation,storage, search, sharing, transfer, analysis, andvisualization

cloud

tools

statistics

No SQL

compression

storage

support

database

analyze

information

terabytes

processing

mobile

Big Data

Page 6: Spark Streaming

Slide 6Slide 6 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

What is Big Data

Page 7: Spark Streaming

Slide 7Slide 7 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

What is Big Data

Page 8: Spark Streaming

Slide 8Slide 8 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

What is Spark?

Apache Spark is a general-purpose cluster in-memory computing system

Provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs

Provides various high level tools like Spark SQL for structured data processing, Mlib for Machine Learning and more..

High Level APIs

High Level Tools

More…

Page 9: Spark Streaming

Slide 9Slide 9 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Why Spark?

Cluster Manager

Deployment

via YARN

The Spark framework can be deployed through Apache Mesos, Apache Hadoop via Yarn, or Spark’s own cluster manager.

Page 10: Spark Streaming

Slide 10Slide 10 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Why Spark?

Polyglot Scala

Spark framework is polyglot – Can be programmed in several programming languages (Currently Scala, Java and Python supported).

Page 11: Spark Streaming

Slide 11Slide 11 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Why Spark?

Provides powerful caching and disk persistence capabilities

Interactive Data Analysis

Faster Batch

Iterative Algorithms

Real-Time Stream Processing

Faster Decision-Making

Page 12: Spark Streaming

Slide 12Slide 12 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Spark Community is Super Active!

Page 13: Spark Streaming

Slide 13Slide 13 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Spark Ecosystem

Spark Core Engine

Aplha/Pre-alpha

Shark (SQL)

SparkStreaming(Streaming)

MLLib(Machine learning)

GraphX(Graph

Computation)

SparkR(R on Spark)

BlindDB(Approximate

SQL)

Page 14: Spark Streaming

Slide 14Slide 14 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Spark Ecosystem (Contd.)

Used for structured data. Can run unmodified hive queries on existing Hadoop deployment.

Spark Core Engine

Aplha/Pre-alpha

Shark (SQL)

SparkStreaming(Streaming)

MLLib(Machine learning)

GraphX(Graph

Computation)

SparkR(R on Spark)

BlindDB(Approximate

SQL)

Enables analytical and interactive apps for live streaming data.

An approximate query engine. To run over Core Spark Engine.

Graph Computation engine.(Similar to Giraph)

Package for R language to enable R-users to leverage Spark power from R shell.

Machine learning library being built on top of Spark. Provision for support to many machine learning algorithms with speeds upto 100 times faster than Map-Reduce.

Page 15: Spark Streaming

Slide 15Slide 15 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Spark Ecosystem (Contd.)

Page 16: Spark Streaming

Slide 16Slide 16 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Spark Functional Features

Page 17: Spark Streaming

Slide 17Slide 17 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Spark Non-functional Features

Page 18: Spark Streaming

Slide 18Slide 18 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Page 19: Spark Streaming

Slide 19Slide 19 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Introduction to Scala

Page 20: Spark Streaming

Slide 20Slide 20 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Introduction to Scala

Page 21: Spark Streaming

Slide 21Slide 21 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Scala frameworks

Page 22: Spark Streaming

Slide 22Slide 22 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Why Scala?

Page 23: Spark Streaming

Slide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in

Demo : Spark Streaming

Write a Spark streaming program, which counts the number of linescontaining the word “FATAL” and keeps reporting it on console.

Page 24: Spark Streaming

Slide 24

Questions?

Buy Spark Course at : www.edureka.co

Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Page 25: Spark Streaming