access and organize data with matlab

Upload: guillermo-huerta

Post on 04-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Access and Organize Data With MATLAB

    1/24

    Access and Organize Data with MAT

    Graham Dudgeon, MEng, PhD

    Industry Manager

    MathWorks

    [email protected]

  • 8/13/2019 Access and Organize Data With MATLAB

    2/24

  • 8/13/2019 Access and Organize Data With MATLAB

    3/24

    From Data to Decisions & Design

    Observation and Access Sensing

    Collecting

    Health Status

    Data Acquisition

    Organization Filtering Signal Analysis

    Data Reduction

    Plotting

    Understanding Analytics

    Frequency & Time-domain

    Predictive Analytics

    Extrapolation

    Decisions & Design Reporting & Apps Scalable Deployment

    Design Optimization

    PhysicalSensors

    Data

    Information

    Knowledge

    Action

  • 8/13/2019 Access and Organize Data With MATLAB

    4/24

    From Data to Decisions & Design

    Observation and Access Sensing

    Collecting

    Health Status

    Data Acquisition

    Organization Filtering Signal Analysis

    Data Reduction

    Plotting

    Understanding Analytics

    Frequency & Time-domain

    Predictive Analytics

    Extrapolation

    Decisions & Design Reporting & Apps Scalable Deployment

    Design Optimization

    File I/O Text Spreadsheet

    XML

    CDF/HDF

    Image

    Audio

    Video

    Geospatial

    Web content

    Hardware Data acq

    Image ca

    GPU

    Lab inst

    Communication Protocols CAN (Controller Area Network)

    DDS (Data Distribution Service) OPC (OLE for Process Control)

    XCP (eXplicit Control Protocol)

    Database Acc Financial Data

    ODBC

    JDBC

    HDFS (Hadoop

    PhysicalSensors

    Data

    Information

    Knowledge

    Action

  • 8/13/2019 Access and Organize Data With MATLAB

    5/24

    From Data to Decisions & Design

    Observation and Access Sensing

    Collecting

    Health Status

    Data Acquisition

    Organization Filtering Signal Analysis

    Data Reduction

    Plotting

    Understanding Analytics

    Frequency & Time-domain

    Predictive Analytics

    Extrapolation

    Decisions & Design Reporting & Apps Scalable Deployment

    Design Optimization

    Data ProcessingConvert, Sync, Clean, Reduce

    PhysicalSensors

    Data

    Information

    Knowledge

    Action

  • 8/13/2019 Access and Organize Data With MATLAB

    6/24

    From Data to Decisions & Design

    Observation and Access Sensing

    Collecting

    Health Status

    Data Acquisition

    Organization Filtering Signal Analysis

    Data Reduction

    Plotting

    Understanding Analytics

    Frequency & Time-domain

    Predictive Analytics

    Extrapolation

    Decisions & Design Reporting & Apps Scalable Deployment

    Design Optimization

    Visualization

    PhysicalSensors

    Data

    Information

    Knowledge

    Action

  • 8/13/2019 Access and Organize Data With MATLAB

    7/24

    From Data to Decisions & Design

    Observation and Access Sensing

    Collecting

    Health Status

    Data Acquisition

    Organization

    Filtering Signal Analysis

    Data Reduction

    Plotting

    Understanding Analytics

    Frequency & Time-domain

    Predictive Analytics

    Extrapolation

    Decisions & Design Reporting & Apps Scalable Deployment

    Design Optimization Exploratory AnalysisDerived metrics, events, conditions

    MPG A cc eler ation Dis plac ement Weig

    MPG

    Acceleration

    Displacement

    Weight

    Horsepower

    2000 4200 40010 2020 40

    50

    100

    150

    200

    2000

    4000

    200

    400

    10

    20

    20

    40

    PhysicalSensors

    Data

    Information

    Knowledge

    Action

  • 8/13/2019 Access and Organize Data With MATLAB

    8/24

    Reading in Multiple Files

    Files with equivalent formats and well ordered file names can be re

    using a for-loop. Speed advantages can be gained by using parfor.

    parfor l = 1:no_files

    fid = fopen([data',num2str(l),'.txt']);

    ww = textscan(fid,'%f %f');

    fclose(fid);

    time(:,l) = ww{:,1};

    data(:,l) = ww{:,2};

    end

  • 8/13/2019 Access and Organize Data With MATLAB

    9/24

    Classification

    stable

    unstable

    neutral

    Taking an example of

    dynamic responses,

    classify the responsesautomatically into an

    appropriate category

    Create a categorical

    array to allow logicalindexing on a

    categorical basis

  • 8/13/2019 Access and Organize Data With MATLAB

    10/24

    Working With Data Too Large To Fit Into System Memo

    h1 = memmapfile('large_file.txt','Format', {'uint8',[14015 100010e

    Use memory mapping to point to the data, and format the data such

    sections of interest are easily indexed.

    h2 = memmapfile('travelTime.dat','Format',{'double',[1911 1201 10

    nsection size in bytes

    qq = textscan(char(h1.Data.x(:,:,1)),'%f');

    ww = reshape(qq{:},1001,1000);

    ww = h2.Data.x(:,:,1);

    section number

    Text File

    Binary File

  • 8/13/2019 Access and Organize Data With MATLAB

    11/24

    High Level Format of a Text File

    Header Section 1

    Header Section 2

    Header Section n

    Data Section 1

    Data Section 2

    Data Section n

    Separated by row delimiters

    may change for each sectio

  • 8/13/2019 Access and Organize Data With MATLAB

    12/24

    Working with Variable Column Lengths (1)

    Where a file format supports variable column lengths for a data sec

    approach to read the data in is as follows,

    Read in the column headers using fgetl and textscan

    >> line = fgetl(fid)

    line =

    ~A DEPTH ILM ILD

  • 8/13/2019 Access and Organize Data With MATLAB

    13/24

    Working with Variable Column Lengths (2)

    >> col_heads = textscan(line,'%s');

    >> col_heads{:}

    ans =

    '~A''DEPTH'

    'ILM'

    'ILD'

    >> col_heads = col_heads{:}(2:end); % strip off the '~A';

  • 8/13/2019 Access and Organize Data With MATLAB

    14/24

    Working with Variable Column Lengths (3)

    Use repmat to create a format string to read in the data using textsc

    >> num_cols = numel(col_heads); % number of columns

    >> format = repmat('%f',1,num_cols)

    format =

    %f%f%f

    >> data = textscan(fid,format); % retrieve all the measured data

  • 8/13/2019 Access and Organize Data With MATLAB

    15/24

    Working with Variable Column Lengths (4)

    Use the column headers to create data structure field names using

    expressions and use a loop to place the data columns under the co

    for l = 1:num_cols

    data1.(col_heads{l}) = data{:,l};

    end

    data1.(col_heads{1}) data1.DEPTH

    data1.(col_heads{2}) data1.ILM

    data1.(col_heads{3}) data1.ILD

  • 8/13/2019 Access and Organize Data With MATLAB

    16/24

    Condition a Line of Text that Contains Different Delim

    and Different Substring Identifiers (1)

    Files may contain combinations of delimiters that serve the same pu

    such as whitespace, tab or comma to separate column entries. Thealso be substrings that are enclosed by unique substring identifiers

    9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS

    10 ,1 80.000 , NAME10' 0.000,, 1 ' BUS10

    9 1 10.000 NAME9 0.000 0.000 1 BUS09

    10 1 80.000 NAME10 0.000 0.000 1 BUS10

  • 8/13/2019 Access and Organize Data With MATLAB

    17/24

    Condition a Line of Text that Contains Different Delim

    and Different Substring Identifiers (2)

    Use regular expression replacement to identify and replace delimite

    characters as appropriate.

    >> str1 = regexprep(str1,',\s*,',', 0.000 ,');

    9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS

    10 ,1 80.000 , NAME10' 0.000,, 1 ' BUS10

    9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS

    10 ,1 80.000 , NAME10' 0.000, 0.000 , 1 ' BUS10

  • 8/13/2019 Access and Organize Data With MATLAB

    18/24

    Condition a Line of Text that Contains Different Delim

    and Different Substring Identifiers (3)

    Use regular expressions to identify substrings and sprintf to replace

    substring with a conditioned version.

    >> [start_idx,end_idx] = regexp(str2,'"\s*\w*\s*"');

    9 1, 10.000 NAME9" 0.000, 0.000 1 ' BUS

    10 ,1 80.000 , NAME10' 0.000, 0.000 , 1 ' BUS10

    9 1, 10.000 NAME9 0.000, 0.000 1 ' BUS

    10 ,1 80.000 , NAME10' 0.000, 0.000 , 1 ' BUS10

  • 8/13/2019 Access and Organize Data With MATLAB

    19/24

    Synchronize Data to a Common Axis

    Merge tables together

    Popular Joins:

    Inner

    Full Outer

    Left Outer Right Outer

    Inner Join

    Full Outer Join

    Left Outer Join

  • 8/13/2019 Access and Organize Data With MATLAB

    20/24

    Full Outer Join

    X Y Z1 0.1 0.2

    3 0.3 0.4

    5 0.5 0.6

    7 0.7 0.8

    Key B Y Z

    1

    3

    4

    57

    9

    First Data Set

    A B

    1 1.1

    4 1.4

    7 1.7

    9 1.9

    Second Data Set

    Key

    Key

    1.1

    1.4

    1.7

    1.9

    0.1

    0.3

    0.7

    0.5

    0.2

    0.4

    0.8

    0.6

    NaN

    NaN

    NaN

    NaN

    NaN

    NaN

    Joined Data Set

  • 8/13/2019 Access and Organize Data With MATLAB

    21/24

    Techniques to Handle Missing Data

    List-wise deletion Unbiased estimates

    Reduces sample size

    Implementation options

    Built in to many

    MATLAB functions

    Manual filtering

  • 8/13/2019 Access and Organize Data With MATLAB

    22/24

    Techniques to Handle Missing Data

    Substitutionreplace missingdata points with a reasonable

    approximation

    Easy to model

    Too important to exclude

  • 8/13/2019 Access and Organize Data With MATLAB

    23/24

    Summary

    Challenges and Solutions

    Access data from multiple sources including SQL databases, data historia

    instrumentation, and files Examples: SQL database connection, URL file reading and reading multiple file

    Work with data too large to fit into system memory

    Binary and text files

    Write and test functions that read in industry specific text file formats

    Examples: Log ASCII files (.LAS) and Power system formats (IEEE CDF and o

    Organize multiple data sets into single data containers using Data Tables

    Visualize and interact with data

    Automate the detection and classification of events

    Threshold detection and stability classification

  • 8/13/2019 Access and Organize Data With MATLAB

    24/24

    Find Out More

    Get answers to your questions

    E-mail the presenters at [email protected]

    Include the webinar title and date in your e-mail

    View recorded webinars www.mathworks.com/recordedwebinars

    Visit MATLAB Central

    www.mathworks.com/matlabcentral

    Contact a MathWorks sales representative

    In North America, call 508-647-7000

    In other locations, visit www.mathworks.com/webcontact

    for

    contact information

    mailto:[email protected]://www.mathworks.com/recordedwebinarshttp://www.mathworks.com/matlabcentralhttp://www.mathworks.com/webcontacthttp://www.mathworks.com/webcontacthttp://www.mathworks.com/matlabcentralhttp://www.mathworks.com/recordedwebinarsmailto:[email protected]