access and organize data with matlab
TRANSCRIPT
-
8/13/2019 Access and Organize Data With MATLAB
1/24
Access and Organize Data with MAT
Graham Dudgeon, MEng, PhD
Industry Manager
MathWorks
-
8/13/2019 Access and Organize Data With MATLAB
2/24
-
8/13/2019 Access and Organize Data With MATLAB
3/24
From Data to Decisions & Design
Observation and Access Sensing
Collecting
Health Status
Data Acquisition
Organization Filtering Signal Analysis
Data Reduction
Plotting
Understanding Analytics
Frequency & Time-domain
Predictive Analytics
Extrapolation
Decisions & Design Reporting & Apps Scalable Deployment
Design Optimization
PhysicalSensors
Data
Information
Knowledge
Action
-
8/13/2019 Access and Organize Data With MATLAB
4/24
From Data to Decisions & Design
Observation and Access Sensing
Collecting
Health Status
Data Acquisition
Organization Filtering Signal Analysis
Data Reduction
Plotting
Understanding Analytics
Frequency & Time-domain
Predictive Analytics
Extrapolation
Decisions & Design Reporting & Apps Scalable Deployment
Design Optimization
File I/O Text Spreadsheet
XML
CDF/HDF
Image
Audio
Video
Geospatial
Web content
Hardware Data acq
Image ca
GPU
Lab inst
Communication Protocols CAN (Controller Area Network)
DDS (Data Distribution Service) OPC (OLE for Process Control)
XCP (eXplicit Control Protocol)
Database Acc Financial Data
ODBC
JDBC
HDFS (Hadoop
PhysicalSensors
Data
Information
Knowledge
Action
-
8/13/2019 Access and Organize Data With MATLAB
5/24
From Data to Decisions & Design
Observation and Access Sensing
Collecting
Health Status
Data Acquisition
Organization Filtering Signal Analysis
Data Reduction
Plotting
Understanding Analytics
Frequency & Time-domain
Predictive Analytics
Extrapolation
Decisions & Design Reporting & Apps Scalable Deployment
Design Optimization
Data ProcessingConvert, Sync, Clean, Reduce
PhysicalSensors
Data
Information
Knowledge
Action
-
8/13/2019 Access and Organize Data With MATLAB
6/24
From Data to Decisions & Design
Observation and Access Sensing
Collecting
Health Status
Data Acquisition
Organization Filtering Signal Analysis
Data Reduction
Plotting
Understanding Analytics
Frequency & Time-domain
Predictive Analytics
Extrapolation
Decisions & Design Reporting & Apps Scalable Deployment
Design Optimization
Visualization
PhysicalSensors
Data
Information
Knowledge
Action
-
8/13/2019 Access and Organize Data With MATLAB
7/24
From Data to Decisions & Design
Observation and Access Sensing
Collecting
Health Status
Data Acquisition
Organization
Filtering Signal Analysis
Data Reduction
Plotting
Understanding Analytics
Frequency & Time-domain
Predictive Analytics
Extrapolation
Decisions & Design Reporting & Apps Scalable Deployment
Design Optimization Exploratory AnalysisDerived metrics, events, conditions
MPG A cc eler ation Dis plac ement Weig
MPG
Acceleration
Displacement
Weight
Horsepower
2000 4200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
PhysicalSensors
Data
Information
Knowledge
Action
-
8/13/2019 Access and Organize Data With MATLAB
8/24
Reading in Multiple Files
Files with equivalent formats and well ordered file names can be re
using a for-loop. Speed advantages can be gained by using parfor.
parfor l = 1:no_files
fid = fopen([data',num2str(l),'.txt']);
ww = textscan(fid,'%f %f');
fclose(fid);
time(:,l) = ww{:,1};
data(:,l) = ww{:,2};
end
-
8/13/2019 Access and Organize Data With MATLAB
9/24
Classification
stable
unstable
neutral
Taking an example of
dynamic responses,
classify the responsesautomatically into an
appropriate category
Create a categorical
array to allow logicalindexing on a
categorical basis
-
8/13/2019 Access and Organize Data With MATLAB
10/24
Working With Data Too Large To Fit Into System Memo
h1 = memmapfile('large_file.txt','Format', {'uint8',[14015 100010e
Use memory mapping to point to the data, and format the data such
sections of interest are easily indexed.
h2 = memmapfile('travelTime.dat','Format',{'double',[1911 1201 10
nsection size in bytes
qq = textscan(char(h1.Data.x(:,:,1)),'%f');
ww = reshape(qq{:},1001,1000);
ww = h2.Data.x(:,:,1);
section number
Text File
Binary File
-
8/13/2019 Access and Organize Data With MATLAB
11/24
High Level Format of a Text File
Header Section 1
Header Section 2
Header Section n
Data Section 1
Data Section 2
Data Section n
Separated by row delimiters
may change for each sectio
-
8/13/2019 Access and Organize Data With MATLAB
12/24
Working with Variable Column Lengths (1)
Where a file format supports variable column lengths for a data sec
approach to read the data in is as follows,
Read in the column headers using fgetl and textscan
>> line = fgetl(fid)
line =
~A DEPTH ILM ILD
-
8/13/2019 Access and Organize Data With MATLAB
13/24
Working with Variable Column Lengths (2)
>> col_heads = textscan(line,'%s');
>> col_heads{:}
ans =
'~A''DEPTH'
'ILM'
'ILD'
>> col_heads = col_heads{:}(2:end); % strip off the '~A';
-
8/13/2019 Access and Organize Data With MATLAB
14/24
Working with Variable Column Lengths (3)
Use repmat to create a format string to read in the data using textsc
>> num_cols = numel(col_heads); % number of columns
>> format = repmat('%f',1,num_cols)
format =
%f%f%f
>> data = textscan(fid,format); % retrieve all the measured data
-
8/13/2019 Access and Organize Data With MATLAB
15/24
Working with Variable Column Lengths (4)
Use the column headers to create data structure field names using
expressions and use a loop to place the data columns under the co
for l = 1:num_cols
data1.(col_heads{l}) = data{:,l};
end
data1.(col_heads{1}) data1.DEPTH
data1.(col_heads{2}) data1.ILM
data1.(col_heads{3}) data1.ILD
-
8/13/2019 Access and Organize Data With MATLAB
16/24
Condition a Line of Text that Contains Different Delim
and Different Substring Identifiers (1)
Files may contain combinations of delimiters that serve the same pu
such as whitespace, tab or comma to separate column entries. Thealso be substrings that are enclosed by unique substring identifiers
9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS
10 ,1 80.000 , NAME10' 0.000,, 1 ' BUS10
9 1 10.000 NAME9 0.000 0.000 1 BUS09
10 1 80.000 NAME10 0.000 0.000 1 BUS10
-
8/13/2019 Access and Organize Data With MATLAB
17/24
Condition a Line of Text that Contains Different Delim
and Different Substring Identifiers (2)
Use regular expression replacement to identify and replace delimite
characters as appropriate.
>> str1 = regexprep(str1,',\s*,',', 0.000 ,');
9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS
10 ,1 80.000 , NAME10' 0.000,, 1 ' BUS10
9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS
10 ,1 80.000 , NAME10' 0.000, 0.000 , 1 ' BUS10
-
8/13/2019 Access and Organize Data With MATLAB
18/24
Condition a Line of Text that Contains Different Delim
and Different Substring Identifiers (3)
Use regular expressions to identify substrings and sprintf to replace
substring with a conditioned version.
>> [start_idx,end_idx] = regexp(str2,'"\s*\w*\s*"');
9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS
10 ,1 80.000 , NAME10' 0.000, 0.000 , 1 ' BUS10
9 1, 10.000 NAME9 0.000, 0.000 1 ' BUS
10 ,1 80.000 , NAME10' 0.000, 0.000 , 1 ' BUS10
-
8/13/2019 Access and Organize Data With MATLAB
19/24
Synchronize Data to a Common Axis
Merge tables together
Popular Joins:
Inner
Full Outer
Left Outer Right Outer
Inner Join
Full Outer Join
Left Outer Join
-
8/13/2019 Access and Organize Data With MATLAB
20/24
Full Outer Join
X Y Z1 0.1 0.2
3 0.3 0.4
5 0.5 0.6
7 0.7 0.8
Key B Y Z
1
3
4
57
9
First Data Set
A B
1 1.1
4 1.4
7 1.7
9 1.9
Second Data Set
Key
Key
1.1
1.4
1.7
1.9
0.1
0.3
0.7
0.5
0.2
0.4
0.8
0.6
NaN
NaN
NaN
NaN
NaN
NaN
Joined Data Set
-
8/13/2019 Access and Organize Data With MATLAB
21/24
Techniques to Handle Missing Data
List-wise deletion Unbiased estimates
Reduces sample size
Implementation options
Built in to many
MATLAB functions
Manual filtering
-
8/13/2019 Access and Organize Data With MATLAB
22/24
Techniques to Handle Missing Data
Substitutionreplace missingdata points with a reasonable
approximation
Easy to model
Too important to exclude
-
8/13/2019 Access and Organize Data With MATLAB
23/24
Summary
Challenges and Solutions
Access data from multiple sources including SQL databases, data historia
instrumentation, and files Examples: SQL database connection, URL file reading and reading multiple file
Work with data too large to fit into system memory
Binary and text files
Write and test functions that read in industry specific text file formats
Examples: Log ASCII files (.LAS) and Power system formats (IEEE CDF and o
Organize multiple data sets into single data containers using Data Tables
Visualize and interact with data
Automate the detection and classification of events
Threshold detection and stability classification
-
8/13/2019 Access and Organize Data With MATLAB
24/24
Find Out More
Get answers to your questions
E-mail the presenters at [email protected]
Include the webinar title and date in your e-mail
View recorded webinars www.mathworks.com/recordedwebinars
Visit MATLAB Central
www.mathworks.com/matlabcentral
Contact a MathWorks sales representative
In North America, call 508-647-7000
In other locations, visit www.mathworks.com/webcontact
for
contact information
mailto:[email protected]://www.mathworks.com/recordedwebinarshttp://www.mathworks.com/matlabcentralhttp://www.mathworks.com/webcontacthttp://www.mathworks.com/webcontacthttp://www.mathworks.com/matlabcentralhttp://www.mathworks.com/recordedwebinarsmailto:[email protected]