data analyses_rahul marthe

Upload: jayprakash

Post on 14-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Data Analyses_Rahul Marthe

    1/31

    Data Analyses Skills

    (ID6020 Module)

    Rahul R. MaratheDepartment of Management Studies

  • 7/30/2019 Data Analyses_Rahul Marthe

    2/31

    Introduction: Why?

    Numbers everywhere!

    -- Last year, ID6020 had 386 students registered.This year the number is 405.

    -- Average time required to complete a typicalcatalysis experiment under laboratory conditionsis 34.7.

    Successful professionals are those who can makesense of these numbers.

    In todays world, it is more the case ofinformation overload too much data! It is our jobto make this data tell us a story!

    Sort out what is important and what is not!2

  • 7/30/2019 Data Analyses_Rahul Marthe

    3/31

    Introduction: Why?

    Whether you will be audited by income taxauthorities depends a lot on sampling techniquesused by the IT department, and also on youhitting certain numerical signals.

    The urban traffic planning is done using the datacollected from various locations in a city.

    Market research firms use statistical techniqueson point-of-sale data to understand buyer

    behavior.

    Suitability of a drug is decided by analyzing thefield data collected from trials conducted.

    Thats why every professional should know these 3

  • 7/30/2019 Data Analyses_Rahul Marthe

    4/31

    Introduction: Why?

    Data analysis done traditionally throughStatistical techniques; in recent times, we callthis Data Analytics.

    Today, data analytics encompasses areas like:Statistics (uni- and multi- variate), Probabilitytheory, Stochastic processes, Computationalmethods, Optimization techniques, Data mining,Artificial Intelligence, Econometrics, Numerical

    techniques, Simulation..

    Data analysis Understanding the story told bythe numbers!

    4

  • 7/30/2019 Data Analyses_Rahul Marthe

    5/31

    Introduction: Why?

    Very likely, your research will involve datacollection and analysis.

    Data could be experimental (most engineeringapplications), or secondary data (from surveys humanities and management).

    Data collection and analyses require deepunderstanding of theory and techniques of dataanalytics.

    Your research area itself could be data analytics.

    You certainly require good understanding oftheory and techniques!

    5

  • 7/30/2019 Data Analyses_Rahul Marthe

    6/31

    Introduction: Data

    Data: Any related observations.

    A collection of data is the data set and singleobservation is data point.

    Data can be collected by:1. Observations of incidences occurring (direct

    recording)

    2. Surveys (and sampling)

    3. Conducting experiments etc.

    Data collection is the most important step.Because, if the collected data is not correct,

    analyses and conclusions are incorrect and 6

  • 7/30/2019 Data Analyses_Rahul Marthe

    7/31

    Data collection

    Before relying on any data, test the data by asking:

    Where did the data come from? Is the sourcebiased?

    Do the data support or contradict other evidencewe have?

    Is the evidence missing that might cause us tocome to a different conclusion?

    How many observations do we have? Do theyrepresent all the groups we wish to study?

    Are the conclusions logical? Have we madeconclusions that are not supported by data?

    7

  • 7/30/2019 Data Analyses_Rahul Marthe

    8/31

    Example of misleading data

    Trucking company advertises

    75% of everything you use travels by truck.

    What do you conclude?

    8

  • 7/30/2019 Data Analyses_Rahul Marthe

    9/31

    Before the data analyses.

    Identify: Samples and population

    Apopulation is a collection of all the elementsone wants to study and about which one is tryingto draw conclusions.

    A sample is a collection of some, but not all, ofthe elements of a population.

    Consider a beauty soap which is targeted atmiddleclass women customer aged between 18

    and 45 years,

    The population is the entire set of middle-classfemales of age between 18 45. But you need tobe careful about definition of middle-class.

    Clearly, a school girl is not a member of the 9

  • 7/30/2019 Data Analyses_Rahul Marthe

    10/31

    Before the data analyses.

    Identify and classify variables

    10

    Types ofscales

    Datatype

    Description Example

    Nominal Qualitativ

    e

    Data arranged in

    unorderedcategories

    Gender {Male,

    female}Software {Code A,Code B}

    Ordinal Qualitative

    Orderedcategories

    Quality of chemical{poor, average,

    good}Interval Quantitati

    veRank anddistance fromarbitrary zero

    Temperature(difference works,ratio doesnt!)

    Ratio Quantitati

    ve

    Interval + ratio

    with a meaning

    Weight (object

    weighing 20 kgs istwice as heavy as

  • 7/30/2019 Data Analyses_Rahul Marthe

    11/31

    Quick check

    Can variables with nominal scale be quantitative?Yes or No.

    No Nominal scale has categories. Categories arefor qualitative data.

    Can variables with ordinal scale be qualitative?Yes or No.

    Could be qualitative; could be quantitative. So yes!

    Can nominal or ordinal scale be continuous? Yesor No.

    No! Nominal or ordinal scale is for categorical data.Categorical variables are discrete.

    Can interval scale be continuous and/or discrete?Yes or No.11

  • 7/30/2019 Data Analyses_Rahul Marthe

    12/31

    Before the data analyses.

    Check and question the assumptions made:

    A.Linearity

    B.Normality

    C.SymmetryD.Effect of uncommon observation

    12

  • 7/30/2019 Data Analyses_Rahul Marthe

    13/31

    Example

    13

    Pressure Current

    12.1 4

    12.5 3.9

    12.9 4.11

    13.4 4.414.9 2.01

  • 7/30/2019 Data Analyses_Rahul Marthe

    14/31

    Example (cont.)

    Pressure Current

    12.1 4

    12.5 3.9

    12.9 4.1113.4 4.4

    14.9 2.01

    14 3.7

    14.8 2.7511.8 3.45

    14.65 2.68

    14.2 2.9

    14

  • 7/30/2019 Data Analyses_Rahul Marthe

    15/31

    Before the data analyses.

    Understand the purpose: Data analyses is done toidentify and understand patterns in data and usethis information to make better decisions.

    DATA = STRUCTURE + NON-STRUCTURE

    DATA = EXPLAINED BEHAVIOR + WHITE NOISE

    15

  • 7/30/2019 Data Analyses_Rahul Marthe

    16/31

    Steps in data analysis

    Once data is collected, we need to clean the data,and then summarize, interpret and make sense.

    Three categories:

    1. Descriptive: How can the data be summarized?2. Inferential: How can we draw inferences from the

    data?

    3. Predictive: How can we build predictive models

    using the data available?

    16

  • 7/30/2019 Data Analyses_Rahul Marthe

    17/31

    Summary of data

    Describe the data in graphical or statistical way:

    Some of commonly used graphical tools Frequency distribution tables; Line charts;Histogram; Higher dimensional plots; Scatter plot

    Use of summary statistics

    Measures of central tendency (measures oflocation) Examples?

    Measures of dispersion (extent of scatter)Examples?

    Measure of symmetry (skewness)

    Etc.

    17

  • 7/30/2019 Data Analyses_Rahul Marthe

    18/31

    Interpretation and prediction

    Should depend on:

    Data (variable) type;

    Amount of data;

    Expected type of conclusions.

    Data type:

    18

    Dependent variable Y

    Independent variable

    X

    Quantitative QualitativeQuantitati

    veCorrelation,Regression

    Convert X intoqualitative

    Qualitative ANOVACrosstabulation

    (e.g. Pivot)

  • 7/30/2019 Data Analyses_Rahul Marthe

    19/31

    Example: Bridge failure

    Material DesignLoad

    Corridor Support Status

    Concrete 100 tons Bangalore Central Failed

    Tar 75 tons Ahmedabad

    Multiple Failed

    Tar 150 tons Mumbai Multiple Stillthere!

    Concrete 125 tons Bareily Beams Failed

    Synthetic 200 tons Gangtok Central Stillthere!

    19

  • 7/30/2019 Data Analyses_Rahul Marthe

    20/31

    Questions to ask

    Want to know: Reasons for failure

    Also: factors that may contribute to failure

    Is the data valid? Is the data sufficient?

    Can the conclusions be extrapolated?

    Possible methodology: Clustering algorithms.

    Interpretation depends on whether you look atthis problem as a civil engineer, managementresearcher, or a computer scientist!

    20

  • 7/30/2019 Data Analyses_Rahul Marthe

    21/31

    Example: Chemical reaction

    Time required to complete a chemical reaction ina set of experiments:

    24.2, 20.15, 17.11, 14.83,

    Do you see a trend?

    Can we be more specific?

    Solution methodology: Forecasting

    What if the data has uncertainty?

    21

  • 7/30/2019 Data Analyses_Rahul Marthe

    22/31

    Example: Regression

    22

  • 7/30/2019 Data Analyses_Rahul Marthe

    23/31

    Example: Nonlinear relationships

    23

  • 7/30/2019 Data Analyses_Rahul Marthe

    24/31

    What should you be asking?

    Average time required to complete a typicalcatalysis experiment under laboratory conditions is34.7.

    What do you mean by typical?

    What do you mean by laboratory conditions?

    What were the other sample values? Was average

    value affected by extreme values?What are the units?

    24

  • 7/30/2019 Data Analyses_Rahul Marthe

    25/31

    Courses related to data analyses

    Every department has some course(s) on analysesof data and modeling using data.

    Computational aerodynamics (AS5330)

    Analytical methods in transportation engineering(CE5390)

    Mathematical methods in thermal engg (ME6170)

    Modeling and simulation in manufacturing

    (ME7240) Mathematical methods in materials engg

    (MM5590)

    Probability and Statistics courses offered by 25

  • 7/30/2019 Data Analyses_Rahul Marthe

    26/31

    Courses related to data analyses

    Stochastic processes (multiple courses offered byEE, Mathematics, MS)

    Multiple courses offered by CSE (on data mining,AI, Data structures, Big Data)

    Optimization courses offered by CH, Mathematics,MS etc.

    Econometrics courses offered by HS, MS.

    These courses will probably not teach you how todraw a 3D plot using the data you have, or how tointerpret the same.

    But these courses will help you understand thenumbers and analysis in your research!26

  • 7/30/2019 Data Analyses_Rahul Marthe

    27/31

    Tools for data analyses

    Institute license, available on super-computingserver:

    Abaqus

    Ansys LAMMPS

    Matlab

    Mathematica

    Many more!

    SPSS Many department have licenses. R isavailable free over internet

    Old friend: MS Excel 27

  • 7/30/2019 Data Analyses_Rahul Marthe

    28/31

    What should you be reading?

    Start from basic Data Analysis textbooks understand the basics first.

    Read the advanced texts and research articles need based learning (see what you require,

    understand the pre-requisites and then masterthe technique).

    General reading should never stop!!!

    e.g. Freakonomics: To understand what fun onecan have simply by playing with data!!

    28

  • 7/30/2019 Data Analyses_Rahul Marthe

    29/31

    Data analyses

    Dos:

    Apply the correct analysis technique

    Understand the assumptions of the method

    Enter the data in the selected technique correctly Use the correct equations/software

    Be very careful about the conclusions you draw.

    Donts:

    Try each and every technique to decide whichlooks good.

    Get fooled by jazzy graphs and colors.

    Extrapolate results and conclusions.29

  • 7/30/2019 Data Analyses_Rahul Marthe

    30/31

    Final word

    Data analyses skills are extremely important anduseful.

    Every researcher is going to require these skills atsome point or the other.

    Equip yourself with these techniques and you arebetter prepared for the battle of logic.

    These weapons in your armory have to be usedcarefully, and after knowing their capabilities(and limitations).

    Dont make the mistake of beating everythingwith the same stick different demons requiredifferent tools!

    30

  • 7/30/2019 Data Analyses_Rahul Marthe

    31/31

    Best wishes!!

    Questions? Comments?

    rrmarathe_at_iitm.ac.in