business statistics - qbm117

23
Business Statistics - QBM117 Scatter diagrams and measures of association

Upload: gabriel-barry

Post on 31-Dec-2015

37 views

Category:

Documents


1 download

DESCRIPTION

Business Statistics - QBM117. Scatter diagrams and measures of association. Objectives. To introduce briefly, the topic of regression and correlation. To explore relationships between two variables using the graphical technique of scatter diagrams. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Business Statistics - QBM117

Business Statistics - QBM117

Scatter diagrams and measures of association

Page 2: Business Statistics - QBM117

Objectives

To introduce briefly, the topic of regression and correlation.

To explore relationships between two variables using the graphical technique of scatter diagrams.

To introduce two measures of association which can be used to measure the amount of association between two variables.

Page 3: Business Statistics - QBM117

Regression and correlation: measuring and predicting relationships

Regression and correlation shows us how to summarise the relationship between two factors, based on a bivariate (two variables) set of data.

Correlation is a measure of the strength of the relationship between the two variables;

Regression helps us to predict one variable from the other.

In earlier modules we learnt to look at data, compute and interpret probabilities, draw random samples and perform statistical inference. Now we apply these concepts to explore relationships between several variables.

Page 4: Business Statistics - QBM117

In our earlier studies we learnt to summarise univariate (single variable) data using statistical summaries such as the mean, to describe the centre and the standard deviation to describe the variability.

With bivariate data we could use these same statistics to summarise each variable separately, however the payoff comes from studying them both together, to explore the relationship between them.

Page 5: Business Statistics - QBM117

Economists and business operators are often interested in relationships between two quantitative variables.

Exploring relationships using scatterplots

For example

How does advertising affect sales in my business?

If I increase the price on this product, what effect will this have on demand?

What effect are inflation rates having on unemployment rates, on the price of petrol, on the price of new homes etc?

Page 6: Business Statistics - QBM117

Exploring relationships using scatterplots and correlations

is the relationship between the two variables linear or non linear?

are there any outliers in the data?

what is the strength of the relationship between the two variables? etc.

Scatterplots provide useful insights into the structure of the data such as

Page 7: Business Statistics - QBM117

Correlation is a summary measure of the strength of the relationship. It is both helpful and limited.

If the scatterplot shows either a well behaved linear relationship or no relationship at all, then the correlation provides an excellent summary of the relationship;

If however there are problems with the data such as, a non linear relationship or outliers in the data, the correlation can be misleading.

Therefore correlation on its own has limited use as its interpretation depends on the type of relationship in the data.

Page 8: Business Statistics - QBM117

The Scatterplot

is simply a plot of all the data.

If one variable is seen as causing, affecting, or influencing the other, then it is plotted on the x (horizontal) axis. This variable is referred to as the independent variable. The variable that is affected or influenced by the other, is plotted on the y (vertical) axis. This variable is referred to as the dependent variable.

If neither causes, affects or influences the other, it does not matter which one is plotted where.

Page 9: Business Statistics - QBM117

Correlation measures the strength of the relationship between the two variables

Correlation, denoted (rho) for a population and r for a sample, varies from –1 to +1, summarising the strength of the relationship in the data.

A correlation of 1 indicates a perfect straight-line relationship, with higher values of one variable associated with perfectly predictable higher values of the other variable.

A correlation of –1 indicates a perfect inverse straight-line relationship, with one variable decreasing as the other increases.

For correlations between –1 and 1, the size of the correlation indicates the strength of the relationship while the sign (+ or -) indicates the direction (increasing or decreasing).

Page 10: Business Statistics - QBM117

A correlation of 0 generally indicates no relationship, just randomness.

Correlations must be interpreted with caution as nonlinear structures and outliers can distort the usual interpretation.

Correlation measures how close the data points are to being exactly on a tilted straight line. It has nothing to do with the steepness (slope) of the line.

Page 11: Business Statistics - QBM117

Interpreting Correlation

r = 1• A perfect straight line

tilting up to the right

r = 0• No overall tilt• No relationship?

r = – 1• A perfect straight line

tilting down to the right

X

Y

X

Y

X

Y

X

Y

X

Y

X

Y

Page 12: Business Statistics - QBM117

Various types of relationships

A linear relationship is observed when

the scatterplot shows points bunched randomly around a straight line.

The points could be tightly bunched, falling almost exactly on a line, or more likely, they will be well scattered, forming a ‘cloud’ of points.

Page 13: Business Statistics - QBM117

Example: Exploring TV Ratings

People Meters vs. Nielsen Index• Two measures of the market share of 10 TV

shows• Correlation is r = 0.974

• Very strong positive association (since r is close to 1)

• Linear relationship• Straight line

with scatter

• Increasing relationship• Tilts up and to the right

10

20

30

10 20 30Nielsen Index

Peop

le M

eter

s

Page 14: Business Statistics - QBM117

Example: Merger Deals

Dollars vs. Deals• For mergers and acquisitions by investment

bankers• 134 deals worth $63 billion by Goldman Sachs

• Correlation is r = 0.790• Strong positive association

• Linear relationship• Straight line

with scatter

• Increasing relationship• Tilts up and to the right

0

20

40

60

80

0 50 100 150 200

Deals

Dol

lars

(B

illi

ons)

Page 15: Business Statistics - QBM117

Example: Mortgage Rates & Fees

Interest Rate vs. Loan Fee• For mortgages

• If the interest rate is lower, does the bank make it up with a higher loan fee?

• Correlation is r = – 0.654• Negative association

• Linear relationship• Straight line

with scatter

• Decreasing relationship• Tilts down and to the right

7%

8%

0% 1% 2% 3%Loan fee

Inte

rest

rat

e

Page 16: Business Statistics - QBM117

Various types of relationships

No relationship is observed when

the scatterplot shows a random scatter of points with no tilt either upward or downward.

The points could look like a ‘cloud’ of points that is either circular or oval shaped.

The oval could be either up and down or left and right but it is not tilted (as you move from left to right).

Page 17: Business Statistics - QBM117

Example: The Stock Market

Today’s vs. Yesterday’s Percent Change• Is there momentum?

• If the market was up yesterday, is it more likely to be up today? Or is each day’s performance independent?

• Correlation is r = 0.11• A weak relationship?

• No relationship?• Tilt is neither

up nor down -3%

-2%

-1%

0%

1%

2%

3%

-3% -2% -1% 0% 1% 2% 3%

Yesterday's change

Toda

y's

chan

ge

Page 18: Business Statistics - QBM117

Various types of relationships

A non linear relationship is observed when

the scatterplot shows points bunched around a curve, rather than a straight line.

Correlation and regression analysis must be used with care on nonlinear data sets.

For most problems we first transform one or both of the variables, to obtain a linear relationship, then we fit a regression.

Page 19: Business Statistics - QBM117

Call Price vs. Strike Price• For stock options

• “Call Price” is the price of the option contract to buy stock at the “Strike Price”

• The right to buy at a lower strike price has more value

• A nonlinear relationship• Not a straight line:

A curved relationship

• Correlation r = – 0.895• A negative relationship:

Higher strike price goes

with lower call price

Example: Stock Options

$0

$25

$50

$75

$100

$450 $500 $550 $600 $650

Strike Price

Cal

l Pric

e

Page 20: Business Statistics - QBM117

Example: Maximizing Yield

Output Yield vs. Temperature• For an industrial process

• With a “best” optimal temperature setting

• A nonlinear relationship• Not a straight line:

A curved relationship

• Correlation r = – 0.0155• r suggests no relationship

• But relationship is strong• It tilts neither

up nor down

120

130

140

150

160

500 600 700 800 900

TemperatureY

ield

of

proc

ess

Page 21: Business Statistics - QBM117

Outliers

A data point is an outlier if it does not fit the relationship of the rest of the data.

It can distort statistical summaries and make them very misleading.

Watch out for outliers by looking at the scatterplot and if you can justify removing an outlier (by finding that it should not have been there), then do so.

If you have to leave it, be aware of the problems it can cause and consider reporting statistical summaries (eg the correlation coefficient) both with and without it.

Page 22: Business Statistics - QBM117

Example: Cost and Quantity

Cost vs. Number Produced• For a production facility

• It usually costs more to produce more

• An outlier is visible• A disaster (a fire at the factory)

• High cost, but few produced

3,000

4,000

5,000

20 30 40 50Number produced

Cos

t

0

10,000

0 20 40 60Number produced

Cos

t

Outlier removed:More details,r = 0.869r = – 0.623

Page 23: Business Statistics - QBM117

Reading for next lecture

Read Chapter 18 Sections 18.1 - 18.3

(Chapter 11 Sections 11.1 – 11.3 abridged)