part 0: introduction 0-1/19 statistics and data analysis professor william greene stern school of...

19
Part 0: Introduction -1/19 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Part 0: Introduction0-1/19

Statistics and Data Analysis

Professor William Greene

Stern School of Business

IOMS Department

Department of Economics

Part 0: Introduction0-2/19

Statistics and Data Analysis

Part 0 - Introduction

Part 0: Introduction0-3/19

Professor William Greene;

Economics and IOMS Departments Office: KMEC, 7-90 (Economics Department) Office phone: 212-998-0876 Email: [email protected] URL: http://people.stern.nyu.edu/wgreene

http://people.stern.nyu.edu/wgreene/Statistics/Outline.htm

Part 0: Introduction0-4/19

Course Objectives

Basic Understanding Understand random outcomes and random

information Understand statistical information as the measured

outcomes of random processes Technical Know How

Learn how to analyze statistical information Statistical analysis Model building

Learn how to present statistical information

Part 0: Introduction0-5/19

What Does it Mean?

Slightly more than one-third of Americans have a favorable opinion of the Democratic-led Congress, a poll said Wednesday.

The Pew Research Center for the People & the Press said the 37% expressing a positive opinion represents a decline of 13 points since April.

The favorable percentage is one of the lowest in more than two decades of Pew surveys – if not the lowest, the poll said. The previous low was 40% in January, but the result is not statistically significant because of the margin of error.

(USA Today, 9/3/09, page 4)

Part 0: Introduction0-6/19

2014 Update

Part 0: Introduction0-7/19

Part 0: Introduction0-8/19

Really?To Get Rid of Hiccups, Have Someone Startle You.

The truth is: Most home remedies, like holding your breath or drinking from a glass of water backward, haven't been medically proven to be effective, says Pollack. However, you can try this trick dating back to 1971, when it was published in The New England Journal of Medicine: Swallow one teaspoon of white granulated sugar. According to the study, this tactic resulted in the cessation of hiccups in 19 out of 20 afflicted patients.

Posted August 31, 2010, cnn.comhttp://www.cnn.com/2010/HEALTH/08/31/rs.12.health.myths/index.html?iref=allsearch

Part 0: Introduction0-9/19

Heard on the Street?

Dear Professor Greene,  The WSN is trying to poll people on the Park51 Mosque debate. I saw that you were an [sic] statistics/data analysis professor and I was wondering if you could explain how we should go about conducting this poll. For example, approximatley [sic] how many people would we need to poll for the data to be completley [sic] unbaised [sic]?

Email received September 5, 2010

Part 0: Introduction0-10/19

The following was taken from

http://www.msnbc.msn.com/id/27339545/An msnbc.com guide to presidential pollsWhy results, samples and methodology vary from survey to survey

WASHINGTON - A poll is a small sample of some larger number, an estimate of something about that larger number. For instance, what percentage of people reports that they will cast their ballots for a particular candidate in an election? A sample reflects the larger number from whichit is drawn. Let’s say you had a perfectly mixed barrel of 1,000 tennis balls, of which 700 are white and 300 orange. You do your sample by scooping up just 50 of those tennis balls. If your barrel was perfectly mixed, you wouldn’t need to count all 1,000 tennis balls — your samplewould tell you that 30 percent of the balls were orange.

Part 0: Introduction0-11/19

Your Technical Help WantedOur firm is looking for a [Ph.D.-level] statistician to assist us in analyzing a simple database of compensation levels. Our database includes 93 unique records for different institutions. We expect to analyze two dependent variables against 13 independent variables.

We need to perform multivariate regression analysis to determine which of the variables are statistically significant. We also need to calculate the t-statistics for each of the independent variables and adjusted r-squared values for the multivariate regression model developed.   We expect that some of the variables may need to be transformed prior to creating the regression analysis. Additional statistical approaches and techniques may be required as appropriate. Subsequent to the analysis of each of the variables, we will require a brief write-up detailing any relationships (or lack thereof) uncovered through the analysis. We anticipate that this write-up will be approximately 2-3 pages in length, excluding any supporting appendices. This write up should describe, in plain English, all relevant details regarding the analysis.

Part 0: Introduction0-12/19

Course Prerequisites

Basic algebra. (Especially summation) Geometry (straight lines) Logs and exponents NOTE: I (you) will use only base e (natural)

logs, not base 10 (common) logs in this course.

A smattering of simple calculus. (I may use two or three derivatives during the entire semester.)

Part 0: Introduction0-13/19

Course Materials

Notes: Distributed in first class Text: Stine and Foster. Statistics for

Business: Decision Making and Analysis On the course website:

Miscellaneous notes and materials Class slide presentations Problem sets

http://people.stern.nyu.edu/wgreene/Statistics/Outline.htm

Part 0: Introduction0-14/19

Course Software: MinitabThe Current Version: Minitab 17

Buy: Professional Bookstore

Rent: www.onthehub.com $29.99 to rent for 6 months, $99.99 to own

Search: www.onthehub.com/minitab

Part 0: Introduction0-15/19

Course Outline and Overview1. Presenting Data

Data Types Information content

Data Description Graphical devices: Plots, histograms Statistical: Summary statistics

Part 0: Introduction0-16/19

Data: House Price

Listings and

Per Capita Income by

State

How to describe/summarize them.

How to explain the variation across states

How to determine if there is any connection between the two variables.

Part 0: Introduction0-17/19

Course Outline and Overview2. Explaining How Random Data Arise

Probability: Understanding unpredictable outcomes Precise mathematical principles of random outcomes

e.g., gambling and games of chance Models = descriptions of random outcomes that don’t

have fixed mathematical laws The Normal distribution

THE fundamental model for outcomes involving behavior

Model building for random outcomes using the normal distribution

Part 0: Introduction0-18/19

Course Outline and Overview3. Learning from Data

Statistical inference Hypothesis testing: (Is the correlation large? Can we be

confident that it not actually zero?) Hypothesis tests for specific applications

Mean of a population: Is it a specific value? Applications in regression: Are the variables in the model

really related? An application in marketing: Did the sales promotion

work? How would you find out?

Part 0: Introduction0-19/19

Course Outline and Overview4. Modeling Relationships Between Outcomes

What is correlation? Simple linear regression:

Connecting one variable with another

Multiple regression Model building Understanding

covariation of more than one variable.

IncomePC_1

List

ing_1

3250030000275002500022500200001750015000

900000

800000

700000

600000

500000

400000

Scatterplot of Listing vs IncomePC

Correlation = 0.428. Is this large?

Hawaii. Outlier?