spark streaming

Download Spark Streaming

Post on 02-Aug-2015

96 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

1. www.edureka.co/apache-spark-scala-training Spark Streaming Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions 2. Slide 2Slide 2 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Big Data? What is Spark? Why Spark? Spark Ecosystem Spark Features Scala overview Spark Streaming Demo For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN Objectives of this Session 3. Slide 3Slide 3 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Big Data 4. Slide 4Slide 4 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Big Data 5. Slide 5Slide 5 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Big Data Lots of Data (Terabytes or Petabytes) Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL compression storage support database analyze information terabytes processing mobile Big Data 6. Slide 6Slide 6 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Big Data 7. Slide 7Slide 7 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Big Data 8. Slide 8Slide 8 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Spark? Apache Spark is a general-purpose cluster in-memory computing system Provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs Provides various high level tools like Spark SQL for structured data processing, Mlib for Machine Learning and more.. High Level APIs High Level Tools More 9. Slide 9Slide 9 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why Spark? Cluster Manager Deployment via YARN The Spark framework can be deployed through Apache Mesos, Apache Hadoop via Yarn, or Sparks own cluster manager. 10. Slide 10Slide 10 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why Spark? Polyglot Scala Spark framework is polyglot Can be programmed in several programming languages (Currently Scala, Java and Python supported). 11. Slide 11Slide 11 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why Spark? Provides powerful caching and disk persistence capabilities Interactive Data Analysis Faster Batch Iterative Algorithms Real-Time Stream Processing Faster Decision-Making 12. Slide 12Slide 12 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Spark Community is Super Active! 13. Slide 13Slide 13 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Spark Ecosystem Spark Core Engine Aplha/Pre-alpha Shark (SQL) Spark Streaming (Streaming) MLLib (Machine learning) GraphX (Graph Computation) SparkR (R on Spark) BlindDB (Approximate SQL) 14. Slide 14Slide 14 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Spark Ecosystem (Contd.) Used for structured data. Can run unmodified hive queries on existing Hadoop deployment. Spark Core Engine Aplha/Pre-alpha Shark (SQL) Spark Streaming (Streaming) MLLib (Machine learning) GraphX (Graph Computation) SparkR (R on Spark) BlindDB (Approximate SQL) Enables analytical and interactive apps for live streaming data. An approximate query engine. To run over Core Spark Engine. Graph Computation engine. (Similar to Giraph) Package for R language to enable R-users to leverage Spark power from R shell. Machine learning library being built on top of Spark. Provision for support to many machine learning algorithms with speeds upto 100 times faster than Map-Reduce. 15. Slide 15Slide 15 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Spark Ecosystem (Contd.) 16. Slide 16Slide 16 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Spark Functional Features 17. Slide 17Slide 17 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Spark Non-functional Features 18. Slide 18Slide 18 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions 19. Slide 19Slide 19 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Introduction to Scala 20. Slide 20Slide 20 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Introduction to Scala 21. Slide 21Slide 21 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Scala frameworks 22. Slide 22Slide 22 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why Scala? 23. Slide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in Demo : Spark Streaming Write a Spark streaming program, which counts the number of lines containing the word FATAL and keeps reporting it on console. 24. Slide 24 Questions? Buy Spark Course at : www.edureka.co Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions