![Page 1: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/1.jpg)
Introduction
Amir H. [email protected]
Amirkabir University of Technology(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 1 / 49
![Page 2: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/2.jpg)
Course Information
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 2 / 49
![Page 3: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/3.jpg)
Course Objective
I Introduction to main concepts and principles of cloud computingand data intensive computing.
I How to read, review and present a scientific paper.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 3 / 49
![Page 4: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/4.jpg)
Course Objective
I Introduction to main concepts and principles of cloud computingand data intensive computing.
I How to read, review and present a scientific paper.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 3 / 49
![Page 5: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/5.jpg)
Topics of Study
I Topics we will cover include:• Cloud platforms• Cloud storage, NoSQL and NewSQL databases• Cloud resource management• Batch processing frameworks• Stream processing frameworks• Graph processing frameworks
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 4 / 49
![Page 6: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/6.jpg)
Course Material
I Mainly based on research papers.
I You will find all the material on the course web page:http://www.sics.se/∼amir/cloud14.htm
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 5 / 49
![Page 7: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/7.jpg)
Course Examination
I Mid term exam: 20%
I Final exam: 20%
I Reading assignments: 27%
I Final presentation: 23%
I Final project: 10%
I Lab assignments: 0% :)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 6 / 49
![Page 8: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/8.jpg)
Reading Assignments
I Nine reading assignments.
I You should write a review for each paper (at most two pages).
I For each paper you should:• Identify and motivate the problem.• Pinpoint the main contributions.• Identify positive/negative aspects of the solution/paper.
I For each paper, you might be given some questions to answer.
I Students will work in groups of two.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49
![Page 9: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/9.jpg)
Reading Assignments
I Nine reading assignments.
I You should write a review for each paper (at most two pages).
I For each paper you should:• Identify and motivate the problem.• Pinpoint the main contributions.• Identify positive/negative aspects of the solution/paper.
I For each paper, you might be given some questions to answer.
I Students will work in groups of two.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49
![Page 10: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/10.jpg)
Reading Assignments
I Nine reading assignments.
I You should write a review for each paper (at most two pages).
I For each paper you should:• Identify and motivate the problem.• Pinpoint the main contributions.• Identify positive/negative aspects of the solution/paper.
I For each paper, you might be given some questions to answer.
I Students will work in groups of two.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49
![Page 11: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/11.jpg)
Reading Assignments
I Nine reading assignments.
I You should write a review for each paper (at most two pages).
I For each paper you should:• Identify and motivate the problem.• Pinpoint the main contributions.• Identify positive/negative aspects of the solution/paper.
I For each paper, you might be given some questions to answer.
I Students will work in groups of two.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49
![Page 12: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/12.jpg)
Presentation
I Each group give a 30 minutes talk on a scientific paper.
I The list of papers will be available in the course web page.
I You are also free to choose any other paper, but it should be con-firmed.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 8 / 49
![Page 13: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/13.jpg)
Lab Assignments and the Project
I Implement simple applications on different frameworks through thecourse.
I The solution of each lab assignment will be uploaded on the coursepage, one week after their start dates.
I The final project on top of Spark.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 9 / 49
![Page 14: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/14.jpg)
Discussion Forum
I Use the course discussion forum if you have any questions:http://www.sics.se/∼amir/cloud14.htm
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 10 / 49
![Page 15: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/15.jpg)
Course Overview
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 11 / 49
![Page 16: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/16.jpg)
Data is not information, information is not knowledge, knowledge isnot understanding, understanding is not wisdom.
- Clifford Stoll
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 12 / 49
![Page 17: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/17.jpg)
Big Data
small data big data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 13 / 49
![Page 18: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/18.jpg)
I Big Data refers to datasets and flows largeenough that has outpaced our capability tostore, process, analyze, and understand.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 14 / 49
![Page 19: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/19.jpg)
The Four Dimensions of Big Data
I Volume: data size
I Velocity: data generation rate
I Variety: data heterogeneity
I This 4th V is for Vacillation:Veracity/Variability/Value
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 15 / 49
![Page 20: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/20.jpg)
Where DoesBig Data Come From?
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 16 / 49
![Page 21: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/21.jpg)
Big Data Market Driving Factors
The number of web pages indexed by Google, which were aroundone million in 1998, have exceeded one trillion in 2008, and itsexpansion is accelerated by appearance of the social networks.∗
[∗Wei Fan et al., Mining big data: current status, and forecast to the future, 2013]
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 17 / 49
![Page 22: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/22.jpg)
Big Data Market Driving Factors
The amount of mobile data traffic is expected to grow to 10.8Exabyte per month by 2016.∗
[∗Dan Vesset et al., Worldwide Big Data Technology and Services 2012-2015 Forecast, 2013]
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 18 / 49
![Page 23: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/23.jpg)
Big Data Market Driving Factors
More than 65 billion devices were connected to the Internet by2010, and this number will go up to 230 billion by 2020.∗
[∗John Mahoney et al., The Internet of Things Is Coming, 2013]
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 19 / 49
![Page 24: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/24.jpg)
Big Data Market Driving Factors
Open source communities
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 20 / 49
![Page 25: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/25.jpg)
Big Data Market Driving Factors
Many companies are moving towards using Cloud services toaccess Big Data analytical tools.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 21 / 49
![Page 26: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/26.jpg)
History of Data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 22 / 49
![Page 27: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/27.jpg)
4000 B.C
I Manual recording
I From tablets to papyrus, to parchment, and then to paper
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 23 / 49
![Page 28: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/28.jpg)
1450
I Gutenberg’s printing press
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 24 / 49
![Page 29: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/29.jpg)
1800’s - 1940’s
I Punched cards (no fault-tolerance)
I Binary data
I 1890: US census
I 1911: IBM appeared
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 25 / 49
![Page 30: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/30.jpg)
1940’s - 1950’s
I Magnetic tapes
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 26 / 49
![Page 31: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/31.jpg)
1950’s - 1960’s
I Large-scale mainframe computers
I Batch transaction processing
I File-oriented record processing model (e.g., COBOL)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 27 / 49
![Page 32: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/32.jpg)
1960’s - 1970’s
I Hierarchical DBMS (one-to-many)
I Network DBMS (many-to-many)
I VM OS by IBM → multiple VMs on a single physical node.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 28 / 49
![Page 33: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/33.jpg)
1970’s - 1980’s
I Relational DBMS (tables) and SQL
I ACID
I Client-server computing
I Parallel processing
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 29 / 49
![Page 34: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/34.jpg)
1990’s - 2000’s
I Virtualized Private Network connections (VPN)
I The Internet...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 30 / 49
![Page 35: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/35.jpg)
2000’s - Now
I Cloud computing
I NoSQL: BASE instead of ACID
I Big Data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 31 / 49
![Page 36: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/36.jpg)
Cloud and Big Data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 32 / 49
![Page 37: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/37.jpg)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 33 / 49
![Page 38: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/38.jpg)
Big Data Analytics Stack
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 34 / 49
![Page 39: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/39.jpg)
Hadoop Big Data Analytics Stack
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 35 / 49
![Page 40: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/40.jpg)
Spark Big Data Analytics Stack
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 36 / 49
![Page 41: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/41.jpg)
Big Data - File systems
I Traditional file-systems are not well-designed for large-scale dataprocessing systems.
I Efficiency has a higher priority than other features, e.g., directoryservice.
I Massive size of data tends to store it across multiple machines in adistributed way.
I HDFS/GFS, Amazon S3, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 49
![Page 42: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/42.jpg)
Big Data - File systems
I Traditional file-systems are not well-designed for large-scale dataprocessing systems.
I Efficiency has a higher priority than other features, e.g., directoryservice.
I Massive size of data tends to store it across multiple machines in adistributed way.
I HDFS/GFS, Amazon S3, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 49
![Page 43: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/43.jpg)
Big Data - File systems
I Traditional file-systems are not well-designed for large-scale dataprocessing systems.
I Efficiency has a higher priority than other features, e.g., directoryservice.
I Massive size of data tends to store it across multiple machines in adistributed way.
I HDFS/GFS, Amazon S3, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 49
![Page 44: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/44.jpg)
Big Data - File systems
I Traditional file-systems are not well-designed for large-scale dataprocessing systems.
I Efficiency has a higher priority than other features, e.g., directoryservice.
I Massive size of data tends to store it across multiple machines in adistributed way.
I HDFS/GFS, Amazon S3, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 49
![Page 45: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/45.jpg)
Big Data - Database
I Relational Databases Management Systems (RDMS) were not de-signed to be distributed.
I NoSQL databases relax one or more of the ACID properties: BASE
I Different data models: key/value, column-family, graph, document.
I Hbase/BigTable, Dynamo, Scalaris, Cassandra, MongoDB, Volde-mort, Riak, Neo4J, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 49
![Page 46: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/46.jpg)
Big Data - Database
I Relational Databases Management Systems (RDMS) were not de-signed to be distributed.
I NoSQL databases relax one or more of the ACID properties: BASE
I Different data models: key/value, column-family, graph, document.
I Hbase/BigTable, Dynamo, Scalaris, Cassandra, MongoDB, Volde-mort, Riak, Neo4J, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 49
![Page 47: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/47.jpg)
Big Data - Database
I Relational Databases Management Systems (RDMS) were not de-signed to be distributed.
I NoSQL databases relax one or more of the ACID properties: BASE
I Different data models: key/value, column-family, graph, document.
I Hbase/BigTable, Dynamo, Scalaris, Cassandra, MongoDB, Volde-mort, Riak, Neo4J, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 49
![Page 48: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/48.jpg)
Big Data - Database
I Relational Databases Management Systems (RDMS) were not de-signed to be distributed.
I NoSQL databases relax one or more of the ACID properties: BASE
I Different data models: key/value, column-family, graph, document.
I Hbase/BigTable, Dynamo, Scalaris, Cassandra, MongoDB, Volde-mort, Riak, Neo4J, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 49
![Page 49: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/49.jpg)
Big Data - Resource Management
I Different frameworks require different computing resources.
I Large organizations need the ability to share data and resourcesbetween multiple frameworks.
I Resource management share resources in a cluster between multipleframeworks while providing resource isolation.
I Mesos, YARN, Quincy, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 49
![Page 50: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/50.jpg)
Big Data - Resource Management
I Different frameworks require different computing resources.
I Large organizations need the ability to share data and resourcesbetween multiple frameworks.
I Resource management share resources in a cluster between multipleframeworks while providing resource isolation.
I Mesos, YARN, Quincy, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 49
![Page 51: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/51.jpg)
Big Data - Resource Management
I Different frameworks require different computing resources.
I Large organizations need the ability to share data and resourcesbetween multiple frameworks.
I Resource management share resources in a cluster between multipleframeworks while providing resource isolation.
I Mesos, YARN, Quincy, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 49
![Page 52: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/52.jpg)
Big Data - Resource Management
I Different frameworks require different computing resources.
I Large organizations need the ability to share data and resourcesbetween multiple frameworks.
I Resource management share resources in a cluster between multipleframeworks while providing resource isolation.
I Mesos, YARN, Quincy, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 49
![Page 53: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/53.jpg)
Big Data - Execution Engine
I Scalable and fault tolerance parallel data processing on clusters ofunreliable machines.
I Data-parallel programming model for clusters of commodity ma-chines.
I MapReduce, Spark, Stratosphere, Dryad, Hyracks, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 49
![Page 54: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/54.jpg)
Big Data - Execution Engine
I Scalable and fault tolerance parallel data processing on clusters ofunreliable machines.
I Data-parallel programming model for clusters of commodity ma-chines.
I MapReduce, Spark, Stratosphere, Dryad, Hyracks, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 49
![Page 55: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/55.jpg)
Big Data - Execution Engine
I Scalable and fault tolerance parallel data processing on clusters ofunreliable machines.
I Data-parallel programming model for clusters of commodity ma-chines.
I MapReduce, Spark, Stratosphere, Dryad, Hyracks, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 49
![Page 56: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/56.jpg)
Big Data - Query/Scripting Language
I Low-level programming of execution engines, e.g., MapReduce, isnot easy for end users.
I Need high-level language to improve the query capabilities of exe-cution engines.
I It translates user-defined functions to low-level API of the executionengines.
I Pig, Hive, Shark, Meteor, DryadLINQ, SCOPE, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 49
![Page 57: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/57.jpg)
Big Data - Query/Scripting Language
I Low-level programming of execution engines, e.g., MapReduce, isnot easy for end users.
I Need high-level language to improve the query capabilities of exe-cution engines.
I It translates user-defined functions to low-level API of the executionengines.
I Pig, Hive, Shark, Meteor, DryadLINQ, SCOPE, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 49
![Page 58: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/58.jpg)
Big Data - Query/Scripting Language
I Low-level programming of execution engines, e.g., MapReduce, isnot easy for end users.
I Need high-level language to improve the query capabilities of exe-cution engines.
I It translates user-defined functions to low-level API of the executionengines.
I Pig, Hive, Shark, Meteor, DryadLINQ, SCOPE, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 49
![Page 59: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/59.jpg)
Big Data - Query/Scripting Language
I Low-level programming of execution engines, e.g., MapReduce, isnot easy for end users.
I Need high-level language to improve the query capabilities of exe-cution engines.
I It translates user-defined functions to low-level API of the executionengines.
I Pig, Hive, Shark, Meteor, DryadLINQ, SCOPE, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 49
![Page 60: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/60.jpg)
Big Data - Stream Processing
I Providing users with fresh and low latency results.
I Database Management Systems (DBMS) vs. Data Stream Man-agement Systems (DSMS)
I Storm, S4, SEEP, D-Stream, Naiad, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 49
![Page 61: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/61.jpg)
Big Data - Stream Processing
I Providing users with fresh and low latency results.
I Database Management Systems (DBMS) vs. Data Stream Man-agement Systems (DSMS)
I Storm, S4, SEEP, D-Stream, Naiad, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 49
![Page 62: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/62.jpg)
Big Data - Stream Processing
I Providing users with fresh and low latency results.
I Database Management Systems (DBMS) vs. Data Stream Man-agement Systems (DSMS)
I Storm, S4, SEEP, D-Stream, Naiad, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 49
![Page 63: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/63.jpg)
Big Data - Graph Processing
I Many problems are expressed using graphs: sparse computationaldependencies, and multiple iterations to converge.
I Data-parallel frameworks, such as MapReduce, are not ideal forthese problems: slow
I Graph processing frameworks are optimized for graph-based prob-lems.
I Pregel, Giraph, GraphX, GraphLab, PowerGraph, GraphChi, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 49
![Page 64: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/64.jpg)
Big Data - Graph Processing
I Many problems are expressed using graphs: sparse computationaldependencies, and multiple iterations to converge.
I Data-parallel frameworks, such as MapReduce, are not ideal forthese problems: slow
I Graph processing frameworks are optimized for graph-based prob-lems.
I Pregel, Giraph, GraphX, GraphLab, PowerGraph, GraphChi, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 49
![Page 65: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/65.jpg)
Big Data - Graph Processing
I Many problems are expressed using graphs: sparse computationaldependencies, and multiple iterations to converge.
I Data-parallel frameworks, such as MapReduce, are not ideal forthese problems: slow
I Graph processing frameworks are optimized for graph-based prob-lems.
I Pregel, Giraph, GraphX, GraphLab, PowerGraph, GraphChi, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 49
![Page 66: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/66.jpg)
Big Data - Graph Processing
I Many problems are expressed using graphs: sparse computationaldependencies, and multiple iterations to converge.
I Data-parallel frameworks, such as MapReduce, are not ideal forthese problems: slow
I Graph processing frameworks are optimized for graph-based prob-lems.
I Pregel, Giraph, GraphX, GraphLab, PowerGraph, GraphChi, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 49
![Page 67: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/67.jpg)
Big Data - Machine Learning
I Implementing and consuming machine learning techniques at scaleare difficult tasks for developers and end users.
I There exist platforms that address it by providing scalable machine-learning and data mining libraries.
I Mahout, MLBase, SystemML, Ricardo, Presto, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 49
![Page 68: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/68.jpg)
Big Data - Machine Learning
I Implementing and consuming machine learning techniques at scaleare difficult tasks for developers and end users.
I There exist platforms that address it by providing scalable machine-learning and data mining libraries.
I Mahout, MLBase, SystemML, Ricardo, Presto, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 49
![Page 69: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/69.jpg)
Big Data - Machine Learning
I Implementing and consuming machine learning techniques at scaleare difficult tasks for developers and end users.
I There exist platforms that address it by providing scalable machine-learning and data mining libraries.
I Mahout, MLBase, SystemML, Ricardo, Presto, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 49
![Page 70: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/70.jpg)
Big Data - Configuration and Synchronization Service
I A means to synchronize distributed applications accesses to sharedresources.
I Allows distributed processes to coordinate with each other.
I Zookeeper, Chubby, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 49
![Page 71: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/71.jpg)
Big Data - Configuration and Synchronization Service
I A means to synchronize distributed applications accesses to sharedresources.
I Allows distributed processes to coordinate with each other.
I Zookeeper, Chubby, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 49
![Page 72: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/72.jpg)
Big Data - Configuration and Synchronization Service
I A means to synchronize distributed applications accesses to sharedresources.
I Allows distributed processes to coordinate with each other.
I Zookeeper, Chubby, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 49
![Page 73: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/73.jpg)
Summary
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 46 / 49
![Page 74: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/74.jpg)
Summary
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 47 / 49
![Page 75: Amir H. Payberah amir@sics · Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 49. Presentation I Each group give a30 minutestalk on a scienti c paper. I The list](https://reader030.vdocuments.net/reader030/viewer/2022041004/5ea7e89d7122b651cf32a6aa/html5/thumbnails/75.jpg)
Questions?
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 48 / 49