data analyses_rahul marthe
TRANSCRIPT
-
7/30/2019 Data Analyses_Rahul Marthe
1/31
Data Analyses Skills
(ID6020 Module)
Rahul R. MaratheDepartment of Management Studies
-
7/30/2019 Data Analyses_Rahul Marthe
2/31
Introduction: Why?
Numbers everywhere!
-- Last year, ID6020 had 386 students registered.This year the number is 405.
-- Average time required to complete a typicalcatalysis experiment under laboratory conditionsis 34.7.
Successful professionals are those who can makesense of these numbers.
In todays world, it is more the case ofinformation overload too much data! It is our jobto make this data tell us a story!
Sort out what is important and what is not!2
-
7/30/2019 Data Analyses_Rahul Marthe
3/31
Introduction: Why?
Whether you will be audited by income taxauthorities depends a lot on sampling techniquesused by the IT department, and also on youhitting certain numerical signals.
The urban traffic planning is done using the datacollected from various locations in a city.
Market research firms use statistical techniqueson point-of-sale data to understand buyer
behavior.
Suitability of a drug is decided by analyzing thefield data collected from trials conducted.
Thats why every professional should know these 3
-
7/30/2019 Data Analyses_Rahul Marthe
4/31
Introduction: Why?
Data analysis done traditionally throughStatistical techniques; in recent times, we callthis Data Analytics.
Today, data analytics encompasses areas like:Statistics (uni- and multi- variate), Probabilitytheory, Stochastic processes, Computationalmethods, Optimization techniques, Data mining,Artificial Intelligence, Econometrics, Numerical
techniques, Simulation..
Data analysis Understanding the story told bythe numbers!
4
-
7/30/2019 Data Analyses_Rahul Marthe
5/31
Introduction: Why?
Very likely, your research will involve datacollection and analysis.
Data could be experimental (most engineeringapplications), or secondary data (from surveys humanities and management).
Data collection and analyses require deepunderstanding of theory and techniques of dataanalytics.
Your research area itself could be data analytics.
You certainly require good understanding oftheory and techniques!
5
-
7/30/2019 Data Analyses_Rahul Marthe
6/31
Introduction: Data
Data: Any related observations.
A collection of data is the data set and singleobservation is data point.
Data can be collected by:1. Observations of incidences occurring (direct
recording)
2. Surveys (and sampling)
3. Conducting experiments etc.
Data collection is the most important step.Because, if the collected data is not correct,
analyses and conclusions are incorrect and 6
-
7/30/2019 Data Analyses_Rahul Marthe
7/31
Data collection
Before relying on any data, test the data by asking:
Where did the data come from? Is the sourcebiased?
Do the data support or contradict other evidencewe have?
Is the evidence missing that might cause us tocome to a different conclusion?
How many observations do we have? Do theyrepresent all the groups we wish to study?
Are the conclusions logical? Have we madeconclusions that are not supported by data?
7
-
7/30/2019 Data Analyses_Rahul Marthe
8/31
Example of misleading data
Trucking company advertises
75% of everything you use travels by truck.
What do you conclude?
8
-
7/30/2019 Data Analyses_Rahul Marthe
9/31
Before the data analyses.
Identify: Samples and population
Apopulation is a collection of all the elementsone wants to study and about which one is tryingto draw conclusions.
A sample is a collection of some, but not all, ofthe elements of a population.
Consider a beauty soap which is targeted atmiddleclass women customer aged between 18
and 45 years,
The population is the entire set of middle-classfemales of age between 18 45. But you need tobe careful about definition of middle-class.
Clearly, a school girl is not a member of the 9
-
7/30/2019 Data Analyses_Rahul Marthe
10/31
Before the data analyses.
Identify and classify variables
10
Types ofscales
Datatype
Description Example
Nominal Qualitativ
e
Data arranged in
unorderedcategories
Gender {Male,
female}Software {Code A,Code B}
Ordinal Qualitative
Orderedcategories
Quality of chemical{poor, average,
good}Interval Quantitati
veRank anddistance fromarbitrary zero
Temperature(difference works,ratio doesnt!)
Ratio Quantitati
ve
Interval + ratio
with a meaning
Weight (object
weighing 20 kgs istwice as heavy as
-
7/30/2019 Data Analyses_Rahul Marthe
11/31
Quick check
Can variables with nominal scale be quantitative?Yes or No.
No Nominal scale has categories. Categories arefor qualitative data.
Can variables with ordinal scale be qualitative?Yes or No.
Could be qualitative; could be quantitative. So yes!
Can nominal or ordinal scale be continuous? Yesor No.
No! Nominal or ordinal scale is for categorical data.Categorical variables are discrete.
Can interval scale be continuous and/or discrete?Yes or No.11
-
7/30/2019 Data Analyses_Rahul Marthe
12/31
Before the data analyses.
Check and question the assumptions made:
A.Linearity
B.Normality
C.SymmetryD.Effect of uncommon observation
12
-
7/30/2019 Data Analyses_Rahul Marthe
13/31
Example
13
Pressure Current
12.1 4
12.5 3.9
12.9 4.11
13.4 4.414.9 2.01
-
7/30/2019 Data Analyses_Rahul Marthe
14/31
Example (cont.)
Pressure Current
12.1 4
12.5 3.9
12.9 4.1113.4 4.4
14.9 2.01
14 3.7
14.8 2.7511.8 3.45
14.65 2.68
14.2 2.9
14
-
7/30/2019 Data Analyses_Rahul Marthe
15/31
Before the data analyses.
Understand the purpose: Data analyses is done toidentify and understand patterns in data and usethis information to make better decisions.
DATA = STRUCTURE + NON-STRUCTURE
DATA = EXPLAINED BEHAVIOR + WHITE NOISE
15
-
7/30/2019 Data Analyses_Rahul Marthe
16/31
Steps in data analysis
Once data is collected, we need to clean the data,and then summarize, interpret and make sense.
Three categories:
1. Descriptive: How can the data be summarized?2. Inferential: How can we draw inferences from the
data?
3. Predictive: How can we build predictive models
using the data available?
16
-
7/30/2019 Data Analyses_Rahul Marthe
17/31
Summary of data
Describe the data in graphical or statistical way:
Some of commonly used graphical tools Frequency distribution tables; Line charts;Histogram; Higher dimensional plots; Scatter plot
Use of summary statistics
Measures of central tendency (measures oflocation) Examples?
Measures of dispersion (extent of scatter)Examples?
Measure of symmetry (skewness)
Etc.
17
-
7/30/2019 Data Analyses_Rahul Marthe
18/31
Interpretation and prediction
Should depend on:
Data (variable) type;
Amount of data;
Expected type of conclusions.
Data type:
18
Dependent variable Y
Independent variable
X
Quantitative QualitativeQuantitati
veCorrelation,Regression
Convert X intoqualitative
Qualitative ANOVACrosstabulation
(e.g. Pivot)
-
7/30/2019 Data Analyses_Rahul Marthe
19/31
Example: Bridge failure
Material DesignLoad
Corridor Support Status
Concrete 100 tons Bangalore Central Failed
Tar 75 tons Ahmedabad
Multiple Failed
Tar 150 tons Mumbai Multiple Stillthere!
Concrete 125 tons Bareily Beams Failed
Synthetic 200 tons Gangtok Central Stillthere!
19
-
7/30/2019 Data Analyses_Rahul Marthe
20/31
Questions to ask
Want to know: Reasons for failure
Also: factors that may contribute to failure
Is the data valid? Is the data sufficient?
Can the conclusions be extrapolated?
Possible methodology: Clustering algorithms.
Interpretation depends on whether you look atthis problem as a civil engineer, managementresearcher, or a computer scientist!
20
-
7/30/2019 Data Analyses_Rahul Marthe
21/31
Example: Chemical reaction
Time required to complete a chemical reaction ina set of experiments:
24.2, 20.15, 17.11, 14.83,
Do you see a trend?
Can we be more specific?
Solution methodology: Forecasting
What if the data has uncertainty?
21
-
7/30/2019 Data Analyses_Rahul Marthe
22/31
Example: Regression
22
-
7/30/2019 Data Analyses_Rahul Marthe
23/31
Example: Nonlinear relationships
23
-
7/30/2019 Data Analyses_Rahul Marthe
24/31
What should you be asking?
Average time required to complete a typicalcatalysis experiment under laboratory conditions is34.7.
What do you mean by typical?
What do you mean by laboratory conditions?
What were the other sample values? Was average
value affected by extreme values?What are the units?
24
-
7/30/2019 Data Analyses_Rahul Marthe
25/31
Courses related to data analyses
Every department has some course(s) on analysesof data and modeling using data.
Computational aerodynamics (AS5330)
Analytical methods in transportation engineering(CE5390)
Mathematical methods in thermal engg (ME6170)
Modeling and simulation in manufacturing
(ME7240) Mathematical methods in materials engg
(MM5590)
Probability and Statistics courses offered by 25
-
7/30/2019 Data Analyses_Rahul Marthe
26/31
Courses related to data analyses
Stochastic processes (multiple courses offered byEE, Mathematics, MS)
Multiple courses offered by CSE (on data mining,AI, Data structures, Big Data)
Optimization courses offered by CH, Mathematics,MS etc.
Econometrics courses offered by HS, MS.
These courses will probably not teach you how todraw a 3D plot using the data you have, or how tointerpret the same.
But these courses will help you understand thenumbers and analysis in your research!26
-
7/30/2019 Data Analyses_Rahul Marthe
27/31
Tools for data analyses
Institute license, available on super-computingserver:
Abaqus
Ansys LAMMPS
Matlab
Mathematica
Many more!
SPSS Many department have licenses. R isavailable free over internet
Old friend: MS Excel 27
-
7/30/2019 Data Analyses_Rahul Marthe
28/31
What should you be reading?
Start from basic Data Analysis textbooks understand the basics first.
Read the advanced texts and research articles need based learning (see what you require,
understand the pre-requisites and then masterthe technique).
General reading should never stop!!!
e.g. Freakonomics: To understand what fun onecan have simply by playing with data!!
28
-
7/30/2019 Data Analyses_Rahul Marthe
29/31
Data analyses
Dos:
Apply the correct analysis technique
Understand the assumptions of the method
Enter the data in the selected technique correctly Use the correct equations/software
Be very careful about the conclusions you draw.
Donts:
Try each and every technique to decide whichlooks good.
Get fooled by jazzy graphs and colors.
Extrapolate results and conclusions.29
-
7/30/2019 Data Analyses_Rahul Marthe
30/31
Final word
Data analyses skills are extremely important anduseful.
Every researcher is going to require these skills atsome point or the other.
Equip yourself with these techniques and you arebetter prepared for the battle of logic.
These weapons in your armory have to be usedcarefully, and after knowing their capabilities(and limitations).
Dont make the mistake of beating everythingwith the same stick different demons requiredifferent tools!
30
-
7/30/2019 Data Analyses_Rahul Marthe
31/31
Best wishes!!
Questions? Comments?
rrmarathe_at_iitm.ac.in