tackling big data using matlab - mathworks · preprocessing and visualizing big data parallelizing...
TRANSCRIPT
![Page 1: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
Tackling Big Data Using MATLAB
Alka Nair
Application Engineer
![Page 2: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/2.jpg)
2
Building Machine Learning Models with Big Data
AccessModel Development
Scale up & Integrate with
Production Systems
Preprocess,
Exploration &
![Page 3: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/3.jpg)
3
Case study: Predict Air Quality
• Temperature
• Pressure
• Relative Humidity
• Dew Point
• Wind speed
• Wind direction
• Ozone
• CO
• NO2
• SO2
Factors Affecting Air QualityMy Weather Page
www.myweather.com/stats.html
![Page 4: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/4.jpg)
4
![Page 5: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/5.jpg)
5
Building Machine Learning Models with Big Data
Access Preprocess, Exploration
& Model Development
Scale up & Integrate with
Production Systems
![Page 6: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/6.jpg)
6
Challenges in Modeling and Deploying Big Data Applications
AccessPreprocess,
Exploration & Model
Development
▪ Distributed Data Storage
▪ Different Data Sources & Types
▪ Preprocessing and Visualizing Big Data
▪ Parallelizing Jobs and Scaling up Computations to Cluster
▪ Enterprise level deployment
Managing Different APIs for Data
Sources and Data Formats▪ Rewriting Algorithms to Use Big
Data Platforms
▪ Parallelizing Code to Scale up to
Use Cluster and Cloud Compute
Overhead in Moving the
Algorithm to Production
Scale up & Integrate
with Production Systems
![Page 7: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/7.jpg)
7
Wouldn’t it be nice if you could:
▪ Easily access data however it is stored
▪ Prototype algorithms quickly using small data sets
▪ Scale up to big data sets running on large clusters
▪ Using the same intuitive MATLAB syntax you are used to
![Page 8: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/8.jpg)
8
Building machine learning models with big data
AccessModel Development
Scale up & Integrate with
Production Systems
Preprocess,
Exploration &
![Page 9: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/9.jpg)
9
Different Data Types Different Data Sources Different Applications
▪ Text
▪ Images
▪ Spreadsheet
▪ Custom File Formats
• Hadoop Distributed File
System (HDFS)
• Amazon S3
• Windows Azure Blob
Storage
• Relational Database
• HDFS on Hortonworks or
Cloudera
• MapReduce
• Image Segmentation
• Image Classification
• Denoising Images
• Predictive Maintenance
Access and Manage Big Data
Datastores
![Page 10: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/10.jpg)
10
Datastore
Cluster of
MachinesMemory
Single
MachineMemory
One or more files
Cluster of
MachinesMemory
Single
MachineMemory
Process
![Page 11: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/11.jpg)
11
Air Quality Data on Local Folder
![Page 12: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/12.jpg)
12
Accessing and Processing different types of data
TabularTextDatastore Text files containing column-oriented data, including
CSV files
ImageDatastore Image files, including formats that are supported
by imread such as JPEG and PNG
SpreadsheetDatastore Spreadsheet files with a supported Excel® format
such as .xlsx
MDFDatastore Datastore for collection of MDF files
Custom Datastore Datastore for custom or proprietary format
Image Collection
MDF
Files
![Page 13: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/13.jpg)
13
You have 1 TB of data you’ve never seen before. How do you
access this data?
![Page 14: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/14.jpg)
14
Historical files are on HDFS and real time data are available
through an API
• Temperature
• Pressure
• Relative Humidity
• Dew Point
• Wind Speed
• Wind Direction
• Ozone
• CO
• NO2
• SO2
![Page 16: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/16.jpg)
16
Preview the data and adjust properties to best represent the
data of interest
![Page 18: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/18.jpg)
18
Datastores enable big data workflowsDeep Learning
![Page 19: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/19.jpg)
19
Datastores enable big data workflowsPredictive
Maintenance
![Page 20: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/20.jpg)
20
Datastores enable big data workflowsFleet
Analytics
![Page 21: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/21.jpg)
21
Different Data Types Different Data Sources Different Applications
▪ Text
▪ Images
▪ Spreadsheet
▪ Custom File Formats
• Hadoop Distributed File
System (HDFS)
• Amazon S3
• Windows Azure Blob
Storage
• Relational Database
• HDFS on Hortonworks or
Cloudera
• MapReduce
• Image Segmentation
• Image Classification
• Denoising Images
• Predictive Maintenance
Datastores: Access Big Data with Minimal Changes
✓ ✓ ✓
![Page 22: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/22.jpg)
22
Building machine learning models with big data
AccessModel Development
Scale up & Integrate with
Production Systems
Preprocess,
Exploration &
![Page 23: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/23.jpg)
23
You have 1TB of data you’ve never seen before. How do you
visualize and process the data?
![Page 24: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/24.jpg)
24
Use tall arrays to work with the data like any MATLAB array
![Page 25: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/25.jpg)
25
▪ Introduction to Tall Arrays
▪ Tall Arrays for Big Data Visualization and Preprocessing
▪ Machine Learning for Big Data Using Tall Arrays
![Page 26: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/26.jpg)
26
Cluster of
Machines
Memory
Single
Machine
Memory
Tall arrays
▪ Data is in one or more files
▪ Files stacked vertically
▪ Typically tabular data
Challenge
▪ Data doesn’t fit into memory
(even cluster memory)
▪ Takes a lot of time for even simple
operations on data
![Page 27: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/27.jpg)
27
tall array
Cluster of
Machines
Memory
Single
Machine
Memory
Tall arrays (new R2016b)
▪ Create tall table from datastore
▪ Operate on whole tall table
just like ordinary table
Datastore
ds = datastore('*.csv')
tt = tall(ds)
summary(tt)
max(tt.EndTime – tt.StartTime)
Single
Machine
MemoryProcess
![Page 28: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/28.jpg)
28
tall array
Cluster of
Machines
Memory
Single
Machine
Memory
tall arrays
▪ With Parallel Computing Toolbox,
process several “chunks” at once
▪ Can scale up to clusters with
MATLAB Distributed Computing Server
Single
Machine
MemoryProcess
Single
Machine
MemoryProcess
Single
Machine
MemoryProcess
Single
Machine
MemoryProcess
Single
Machine
MemoryProcess
Single
Machine
MemoryProcess
![Page 29: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/29.jpg)
29
Use a Spark-enabled Hadoop cluster and MATLAB
Support for many other platforms through reference architectures
![Page 30: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/30.jpg)
30
It’s easy to run MATLAB code on Spark + Hadoop
Spark Connection
Cluster Config for Spark
Hadoop Access
![Page 31: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/31.jpg)
31
MATLAB Documentation for
![Page 32: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/32.jpg)
32
Summary for tall arrays
Process out-of-memory data on your Desktop to explore,
analyze, gain insights and to
develop analytics
MATLAB Distributed Computing Server,
Spark+Hadoop
Local disk,
Shared folders,
Databasesor Spark + Hadoop (HDFS),
for large scale analysis
Use Parallel Computing
Toolbox for increased
performance
Run on Compute Clusters
Develop your code locally using Tall Arrays or
MapReduce only once
Use the same code to scale up to
cluster
![Page 33: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/33.jpg)
33
Create a tall array for each datastore
ozone
![Page 34: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/34.jpg)
34
Execution model makes operations more efficient on big data
▪ Deferred evaluation
– Commands are not executed right
away
– Operations are added to a queue
▪ Execution triggers include:
– gather function
– summary function
– Machine learning models
– Plotting
tt : tall array
![Page 35: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/35.jpg)
35
Execution model makes operations more efficient on big data
Unnecessary results are not
computed
![Page 36: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/36.jpg)
36
✓ Introduction to Tall Arrays
▪ Tall Arrays for Big Data Visualization and Preprocessing
▪ Machine Learning for Big Data Using Tall Arrays
![Page 37: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/37.jpg)
37
Explore Big Data with Tall Visualizations
plot
scatter
binscatter
histogram
histogram2
ksdensity
![Page 38: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/38.jpg)
38
Explore Big Data with Tall Visualizations
![Page 39: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/39.jpg)
39
Get a summary of the data
tt – tall table
![Page 40: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/40.jpg)
40
Use data types to best represent the data
![Page 41: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/41.jpg)
41
Managing Big and Messy Time-stamped Data
![Page 42: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/42.jpg)
42
Use the results of explorations to help make decisions
- Synchronize to daily data
- By location
![Page 43: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/43.jpg)
43
Synchronize all data to daily times
![Page 44: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/44.jpg)
44
Clean messy data using common preprocessing functions
![Page 45: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/45.jpg)
45
Use familiar MATLAB functions on tall arrays
Functions Supported with Tall Arrays
![Page 46: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/46.jpg)
46
You don’t need to leave MATLAB to monitor large jobs
![Page 48: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/48.jpg)
48
✓ Introduction to Tall Arrays
✓ Tall Arrays for Big Data Visualization and Preprocessing
▪ Machine Learning for Big Data Using Tall Arrays
![Page 49: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/49.jpg)
49
Predict air quality
Air Quality Index Air Quality Label
Regression Classification
![Page 50: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/50.jpg)
50
How do you know which model to use?
▪ Try them all ☺
![Page 51: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/51.jpg)
51
Use apps for model exploration on a subset of data
Air Quality Index
Regression Learner
Air Quality Label
Classification Learner
![Page 52: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/52.jpg)
52
Validate and Compare Machine Learning Models
![Page 53: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/53.jpg)
53
Validate and Compare Machine Learning Models
![Page 54: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/54.jpg)
54
Validate and Compare Machine Learning Models
![Page 55: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/55.jpg)
55
Validate and Compare Machine Learning Models
![Page 56: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/56.jpg)
56
Scale up with tall machine learning models
▪ Linear Regression (fitlm)
▪ Logistic & Generalized Linear Regression (fitglm)
▪ Discriminant Analysis Classification (fitcdiscr)
▪ K-means Clustering (kmeans)
▪ Principal Component Analysis (pca)
▪ Partition for Cross Validation (cvpartition)
▪ Linear Support Vector Machine (SVM) Classification (fitclinear)
▪ Naïve Bayes Classification (fitcnb)
▪ Random Forest Ensemble Classification (TreeBagger)
▪ Lasso Linear Regression (lasso)
▪ Linear Support Vector Machine (SVM) Regression (fitrlinear)
▪ Single Classification Decision Tree (fitctree)
▪ Linear SVM Classification with Random Kernel Expansion (fitckernel)
▪ Gaussian Kernel Regression (fitrkernel)
![Page 57: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/57.jpg)
57
Training Machine Learning Model against Spark for Air Quality
Classification
![Page 58: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/58.jpg)
58
Train and validate with tall data for Air Quality Index Prediction
![Page 59: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/59.jpg)
59
Select the most important features
![Page 60: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/60.jpg)
61
✓ Introduction to Tall Arrays
✓ Tall Arrays for Big Data Visualization and Preprocessing
✓ Machine Learning for Big Data Using Tall Arrays
![Page 61: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/61.jpg)
62
Building machine learning models with big data
AccessModel Development
Scale up & Integrate with
Production Systems
Preprocess,
Exploration &
![Page 62: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/62.jpg)
63
![Page 63: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/63.jpg)
64
Predict air quality for given location
My Weather Page
www.myweather.com/stats.html
Your Weather Conditions
Get weather conditions for your area.
Location: 01760
Temperature: 32F
Humidity: 76%
Wind: SSW 13 mph
My Weather Page
www.myweather.com/stats.html
Current Weather
MATLAB
Runtime
MATLAB
Runtime
Use MATLAB model running on Spark in Python web
framework
![Page 64: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/64.jpg)
65
Integrate analytics with systems
MATLAB
Runtime
C/C++ ++ExcelAdd-in Java
Hadoop/
Spark.NET
MATLABProduction
Server
StandaloneApplication
Enterprise Systems
Python
C, C++ HDL PLC
Embedded Hardware
GPU
![Page 65: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/65.jpg)
66
Package and test MATLAB code
![Page 66: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/66.jpg)
67
![Page 67: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/67.jpg)
68
Package and test MATLAB code
![Page 68: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/68.jpg)
69
Call MATLAB in production environment
AirQual.ctf
![Page 69: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/69.jpg)
70
MATLAB Production Server
▪ Server software
– Manages packaged MATLAB programs and worker pool
▪ MATLAB Runtime libraries
– Single server can use runtimes
from different releases
▪ RESTful JSON interface
▪ Lightweight client libraries
– C/C++, .NET, Python, and Java
MATLAB Production Server
MATLABRuntime
Request Broker
&
Program
ManagerApplications/
Database
Servers RESTful
JSON
Enterprise
Application
MPS Client
Library
![Page 70: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/70.jpg)
71
MATLAB for Modeling and Deploying Big Data Applications
Access
Preprocess,
Exploration & Model
Development
▪ Distributed Data Storage
▪ Different Data Sources & Types
▪ Preprocessing and Visualizing Big Data
▪ Parallelizing Jobs and Scaling up Computations to Cluster
▪ Enterprise level deployment
Easily Access Data
however/wherever it is stored
using Datastore
Prototype and easily scale up
algorithms to Big Data platforms
using the familiar MATLAB Syntax
with Tall Arrays
Seamless integration with
Enterprise level systems
using MATLAB Production
Server
Scale up & Integrate
with Production Systems
![Page 71: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/71.jpg)
72
Other Resources
▪ Try Tall Array Based Processing on Your Own Set of Big Data
▪ Refer to the example mentioned below to get started:
https://in.mathworks.com/help/matlab/examples/analyze-big-data-in-matlab-using-tall-
arrays.html
How do you get started?
mathworks.com/big-data
mathworks.com/machine-learning eBook
![Page 72: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/72.jpg)
73
MathWorks Training Offerings
http://www.mathworks.com/services/training/
![Page 73: Tackling Big Data Using MATLAB - MathWorks · Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment . Easily](https://reader030.vdocuments.net/reader030/viewer/2022040401/5e7774c580f23d2ec1099245/html5/thumbnails/73.jpg)
74
• Share your experience with MATLAB & Simulink on Social Media
▪ Use #MATLABEXPO
• Share your session feedback: Please fill in your feedback for this session in the feedback form
Speaker Details
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/alka-nair-
1820501a/
Contact MathWorks India
Products/Training Enquiry Booth
Call: 080-6632-6000
Email: [email protected]