![Page 1: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/1.jpg)
Advances and Challenges of Big Data Computing Platforms
Liqiang WangAssociate Professor
Department of Computer ScienceUCF
![Page 2: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/2.jpg)
Big Data: Batch Processing &
Distributed Data StoreHadoop/Spark;
HBase/Cassandra
BI ReportingOLAP &
Dataware house
Business Objects, SAS, Informatica, Cognos other
SQL Reporting Tools
Interactive Business Intelligence &
In-memory RDBMSTableau, HANA
THE EVOLUTION OF BUSINESS INTELLIGENCE
1990’s
2000’s
2010’s
Big Data:More Intelligent and Real Time
Ongoing
![Page 3: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/3.jpg)
3Source: Dion Hinchcliffe, “The enterprise opportunity of Big Data: Closing the ‘clue gap,'”
![Page 4: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/4.jpg)
Essential Training at UCF (Pending)
4
Fundamentals of Cyberinfrastructure
Programming Models and Languages
Data Exploration and
Visualization
Big Data Computing
Data Analytics Case Studies
Adaptive Learning
Virtualization-based Lab Training Sustainability
Effectiveness
Training Concepts
Enhancement Methods Training Aims
Scalability
Data Mining & machine Learning
![Page 5: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/5.jpg)
Hadoop Architecture Hadoop consists of Hadoop 1.0: HDFS and MapReduce Hadoop 2.0: HDFS, Yarn, and MapReduce
5
![Page 6: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/6.jpg)
Hadoop 1 vs 2
6
Hadoop1 Hadoop 2
Components HDFS, MapReduce HDFS, Yarn,MapReduce, other module
Scalability Less More
Name Node Single Multiple
Resource Management Slot Container
Job Type MapReduce MapReduce, MPI, Spark
Reliability Worse Better
JVM re-use Yes No
![Page 7: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/7.jpg)
Yarn & HDFS
7
![Page 8: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/8.jpg)
combinecombine combine combine
ba 1 2 c 9 a c5 2 b c7 8
partition partition partition partition
mapmap map map
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
Shuffle and Sort: aggregate values by keys
reduce
reduce
reduce
a 1 5 b 2 7 c 2 8 9
r1 s1 r2 s2 r3 s3
c 2
![Page 9: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/9.jpg)
Why Use MapReduce Instead of Classical Supercomputing?
9
![Page 10: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/10.jpg)
ComparisonMPI Hadoop/Spark
Node Communication Supports more frequent node communication (tightly coupled)
Usually nodes do not communicate directly (loosely coupled)
Disk I/O Usually load data once Every nodes read/write its own data
Fault tolerance No Yes
Auto-Scaling No Yes
ApplicationsCPU-Intensive Scientific Computing
Data-Intensive Analytics
Challenging ResearchIssues
Scalability Resilience (including
checkpointing) Energy-efficiency
Performance Tuning Integration with Edge
Computing & IoT 10
![Page 11: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/11.jpg)
Hadoop is Slow in Machine Learning!
11
Logistic regression in Hadoop and Spark
![Page 12: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/12.jpg)
Spark vs Hadoop
Spark key features Apache Spark Hadoop MapReduce
Speed Ten to hundred times faster than MapReduce
Slower
Analytics Supports streaming, machine learning, complex analytics, etc
Simple Map and Reduce tasks
Suitable for Real-time streaming Batch processing
Coding Lesser lines of code More lines of code
Processing location In-memory Local disk
12
![Page 13: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/13.jpg)
Spark is Based on Hadoop
COSC 4010/5010 Introduction to HPC 13
![Page 14: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/14.jpg)
Why is Machine Learning Booming Now?
14
Big Data Big Computing Power
![Page 15: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/15.jpg)
Evolution of Machine Learning
15
![Page 16: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/16.jpg)
Distributed Machine Learning
Examples: Tensorflow Simple structure Based on MPI
COSC 4010/5010 Introduction to HPC 16
![Page 17: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting](https://reader033.vdocuments.net/reader033/viewer/2022042301/5ecc4a92605884719c086c1f/html5/thumbnails/17.jpg)
Thank you !