lesson 1 analytics

34
Copyright 2014, Simplilearn, All rights reserved. Copyright 2014, Simplilearn, All rights reserved. Lesson 1 Introduction to Analytics

Upload: pragativbora

Post on 24-Dec-2015

14 views

Category:

Documents


0 download

DESCRIPTION

SA

TRANSCRIPT

Page 1: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

Copyright 2014, Simplilearn, All rights reserved.

Lesson 1

Introduction to Analytics

Page 2: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

● Understand what is analytics and the difference between analysis and analytics

● Know the popular tools used in analytics

● Understand the role of a data scientist

● Know the processes involved in analytics

● Define a problem statement

● Collect and summarize data

● Detect and treat outliers in the data

After completing this course, you will be able to:

Objective Slide

Page 3: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

Analytics versus Analysis

Analytics

Analytics is the science of analysis whereby statistics, data mining, computer technology, etc. is used in doing analysis

Analysis

Analysis is the process of breaking down a complex object into its simpler forms

Page 4: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

What is Analytics?

• It’s the science of wisely acquiring meaningful results from given data using various methods and technologies.

• Aims at discovering pattern of variation from the given data.

• It helps to understand the future from past data and the uncertainty related to business.

• It’s a sophisticated process that uses statistics, mathematics and economics models to predict the future and prescribe strategies.

How analytics works

Gather Data Organize Data Analyze Data

Page 5: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

Analytics Stages

Descriptive Diagnostic Predictive Prescriptive

Information

Insights

Decision

How many students

dropped out last year?

Why has the drop-out rate

increased in the last one year?

Which students are most likely to

drop out?

Which students should I target to

keep from dropping out?

Page 6: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

Popular Tools

R

Revolution R

R Studio

Tableau

SAP HANA

Weka

KXEN

SAS

Page 7: Lesson 1 Analytics

• Inquisitive, can stare at data and spot trends.

• Come out with unrevealed stories hidden in data that helps in creating more useful insights and help solving business problems.

• Work in sync with application developer to get relevant data for analysis.

• Make an analytical plan in such a way that the results satisfy the business needs.

• Come up with an effective data mining architecture and prepare suitable models.

• Respond to and resolve data mining performance issues.

• Generate reports that are affordable from a business perspective.

Role of a Data Scientist

Page 8: Lesson 1 Analytics

Data Analytics Methodology

DISCOVERY

PUT INTO USE

DELIVER RESULTS

MODEL BUILDING

MODEL PLANNING

DATA PREPARING

Page 9: Lesson 1 Analytics

Problem Definition

WHAT IS THE PROBLEM?

WHAT IS IT NOT ?

WE HAVE THIS PROBLEM BECAUSE?

WE DON’T HAVE A SOLUTION BECAUSE?

Page 10: Lesson 1 Analytics

Techniques involved in defining a problem

• State the problem in a general way

• Understand the nature of the problem

• Survey the available literature

• Go for discussions for developing ideas

• Rephrase the research problem into a working proposition

Page 11: Lesson 1 Analytics

Types of Data

Qualitative Data

• Data expressed as groups or categories

• Descriptive data

• E.g. Dividing a population into high, medium and low height groups

Quantitative Data

• Data expressed as numbers

• Definitive Data

• E.g. The height of a person

● Data can be of two types – qualitative and quantitative

Page 12: Lesson 1 Analytics

Summarizing Data

● Summarizing is the process of converting huge amounts of raw data into a format that can be easily analyzed.

● Summaries differ based on the type of data; and can be descriptive or graphical.

Marital Status Frequency

Single 203

Married 2,580

Widowed 334

Divorced 367

Separated 46

Total 3,530

Page 13: Lesson 1 Analytics

Summarizing Data

Numeric - Descriptive

• Mean

• Median

• Mode

Categorical - Descriptive

• Frequency distribution tables

Numeric - Graphical

• Box plot

Categorical - Graphical

• Bar charts

• Histograms

Page 14: Lesson 1 Analytics

Data Collection

● Process of collecting relevant data that aids in solving the problem statement

● Data Collection process needs to be defined, and systematic.

● Observations need to be recorded and organized for optimal usefulness

Collect Relevant Data

Categorize the Data

Organize the Data

Page 15: Lesson 1 Analytics

Data Collection Methods

Observation

Experiment

Census

Questionnaire

Survey

Reporting

● Data collection methods fall broadly into two categories – primary and secondary.

● Primary methods are where the data is gathered directly through investigating, experimenting or observing various entities.

● Secondary methods refer to the methods where the data has already been gathered before the study, and is available as already published facts and reports.

Registration

Data Sources

Page 16: Lesson 1 Analytics

● A Data Dictionary is a file that describes the structure of the database itself.

● Includes details like –

● Number of records

● Name of each field

● Characteristic of each field

● Description of each field

● Relationships between different fields

● It helps in analyzing different data variables and their relationships between each other.

Data Dictionary

Page 17: Lesson 1 Analytics

Outlier Treatment

● Outlier is a point or an observation that

deviates significantly from the other observations.

● Due to experimental errors or “special circumstances”

● Outlier detection tests to check for outliers

● Outlier treatment –

● Retention

● Exclusion

● Other treatment methods

Outlier!

Study time (Minutes)

Mar

k (P

erce

nta

ge)

Page 18: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

● What is analytics and analysis, and what are the differences between them

● Popular tools used in analytics

● What does a data scientist do

● The processes involved in analytics life cycle

● How to formally define a problem statement

● Methods of collecting and summarizing data for analytics

● Data dictionary and its contents

● What are outliers and how to detect and treat outliers

Summary

Here is a quick recap of what we have learned in this lesson

Page 19: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

Quiz

Page 20: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Interviews

Data Sources

Experiments

Surveys

Which of the following is a secondary data collection method?

Page 21: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: c.

Explanation: Surveys, Interviews and Experiments are personally conducted by the researchers, and hence belong to primary data collection methods. Data sources are already existing sources of data – thus belongs to secondary methods.

Which of the following is a secondary data collection method?

Interviews

Data Sources

Experiments

Surveys

Page 22: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which of the following is NOT a part of data dictionary?

Characteristic of fields

Type of fields

Actual records

Number of records

2

Page 23: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: Data dictionary refers to the meta data, i.e., defining the attributes of the data. It does not contain the actual data.

Which of the following is NOT a part of data dictionary?

Characteristic of fields

Type of fields

Actual records

Number of records

2

Page 24: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which of the following is a way of summarizing categorical data?

Frequency distribution

Median

Mode

Mean

3

Page 25: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: b.

Explanation: Mean, median and mode are mathematical summaries of numeric or quantitative data. Frequency distribution is used to summarize categorical or qualitative data.

Which of the following is a way of summarizing categorical data? 3

Frequency distribution

Median

Mode

Mean

Page 26: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Deliver results

Model building

Re-checking

Discovery

4 Which one of the following is NOT a step in data analytics methodology?

Page 27: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: Re-checking is not a step in data analytics methodology.

Deliver results

Model building

Re-checking

Discovery

4 Which one of the following is NOT a step in data analytics methodology?

Page 28: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Interval and ratio

Nominal and ordinal

Random and selective

Primary and secondary

5 What are the two categories of data collection methods?

Page 29: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: a.

Explanation: Data collection methods are classified into primary and secondary

Interval and ratio

Nominal and ordinal

Random and selective

Primary and secondary

5 What are the two categories of data collection methods?

Page 30: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Predictive

Descriptive

Productive

Prescriptive

6 Which of the following is NOT a step in analytics?

Page 31: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: Productive is not a step in analytics.

Predictive

Descriptive

Productive

Prescriptive

6 Which of the following is NOT a step in analytics?

Page 32: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Work with application developers to extract data relevant for analysis.

There is no need of considering statistical algorithm working process.

Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.

Develop and plan required analytic projects in response to business needs

7 Which of the following is FALSE with reference to the role of a data scientist?

Page 33: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: c.

Explanation: Data scientist needs to consider statistical algorithm working process.

Work with application developers to extract data relevant for analysis.

There is no need of considering statistical algorithm working process.

Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.

Develop and plan required analytic projects in response to business needs

7 Which of the following is FALSE with reference to the role of a data scientist?

Page 34: Lesson 1 Analytics

Copyright 2014, Simplilearn, All rights reserved.

Thank You

Copyright 2014, Simplilearn, All rights reserved.