reading studies & statistics

Reading Studies & Statistics

It’s your best protection, because “THEY “are trying

to rule the world.

“The world is a mess and I just need to rule it” (Dr. Horrible, 2008).

Ten Questions to Ask of

Information

1. Is it original research, or summary?2. Where did the data come from?3. Have the data been peer-reviewed?4. How were the data collected?5. Are the comparisons appropriate?6. Are the numbers in context?7. Are the definitions valid?8. Has there been a change in

circumstance?9. Are the questions neutral?10. Is it observation or modelling?

1: Is it original

research or a

summary?• A summary is not the same as original

research.• If it is a summary, does it cite the original

study so you can find it? Best of all is when a web site links directly to the original research.

• A study may say something very different from what the report on the study is claiming.

• This discrepancy is often the result of conscious manipulation, but it is also commonly the result of poor and lazy reporting.

1: Is it original research

or a summary

?

• In 2010, media reported the results of study under headlines such as “Sorry state of affairs? Men less likely to apologize than women: Study” (Vancouver Sun) and “Women apologise more easily than men” (The MedGuru).

• But what exactly did the study show?• Often it is not possible to see the original

research without paying money, but if the summary or report is half-way adequate, you can generally glean relevant information.


or a summary

?

• The University of Waterloo asked 33 men and 33 women to keep a diary of: – how often they felt offended – how often they had given offence– how often they apologised for giving

offence.

• The results showed that women apologised about 35% more often than men.


or a summary

?

• However, according to the study, women also reported feeling offended up to 50% more than men did.

• Nevertheless, for identified offences, both sexes apologised at the same rate: 80% of the time.

• So how else could this study have been reported?– Women take offence more easily than men.– Men and women apologise equally for

perceived offences.– Men more easy going than women, but just

as willing to offer apologies.

1: Is it original

research or a

summary?• Another problem with summaries or

reports of reports is illustrated by a graduate student who was doing his 1995 thesis on gun violence.

• When his research revealed that the number of children killed by gunfire had doubled each year since 1950, it caught the attention of the media.

• This statistic was picked up and re-printed in various publications.

• Following is a chart showing what these figures would mean.

1950 1951 1952 1953 1954 1955 1956 1957 1958 1959

2 4 8 16 32 64 128 256 512 1024

1960 1961 1962 1963 1964 1965 1966 1967 1968 1969

2,048 4,096 8,192 16,384 32,768 65,536 65.536 131,072 262,144 524,288

1970 1971 1972 1973 1974 1975 1976 1977 1978 1979

1.049 mil 2.097 mil 4.2 mil 8.4 mil 16.8 mil 33.5 mil 67.1 mil 134.2 mil 268.4 mil1 536.9 mil

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989

1.1 bil 2.2 bil 4.3 bil 8.6 bil 2 17.2 bil 34.4 bil 68.7 bil 137.4 bil 274.9 bil 549.8 bil

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

1.1 tril 2.2 tril 4.4 tril 8.8 tril 17.6 tril 35.2 tril

Number of children killed by firearms doubled each year since

1950PopulationOf United States in 1978:222.6 mil.Population

Of World in 1983:4.690 bill


or a summary

?

• The student had misread a 1994 report by the Children's Defense Fund that found the number of American children killed each year by guns had doubled since 1950 – not doubled every year since 1950. In other words, it had increased 100%, not 17.6 trillion %.

• Even this statistic isn’t as alarming as it might appear at first since the population has increased 73% since 1950, meaning that in 44 years there has been a 27% increase in the number of children killed by guns – not 100%

2: Where Did the

Data Come From?

• Some sources are good, some bad. • If the person or publication can’t (or won’t)

tell you where the data comes from, that should be your first hint that you need to be very skeptical about what you are being told.

• Even if your data have an identifiable source, remember that many organizations produce or manipulate data in order to promote their own agendas. You’ll want to know if this possibility exists in order to watch for it.

Statistics Every Writer Should Know, by Robert Niles

See also: The Good, the Bad, and the Ugly of Public Opinion Polls, Russell D. Renka, Professor of Political Science, Southeast Missouri State University

http://www.robertniles.com/stats/

http://cstl-cla.semo.edu/renka/Renka_papers/polls.htm

2: Where Did the

Data Come From?

• Just because a report comes from a group with a vested interest in its results doesn't guarantee the report is a sham. Sometimes, because they have more expertise on the subject they can bring a more sophisticated understanding to the research.

• But you should always be extra skeptical when looking at research generated by people with a political agenda. At the least, they have plenty of incentive NOT to tell you about data they found that contradict their organization's position.

2: Where Did the

Data Come From? • Some statistics are nothing more than pure

guesses. • Mitch Snyder, an activist for the homeless, was

asked by the government to give a figure for homelessness in the United States.

• Mitch called others working in the field, and together they guessed that there were between of 2-3 million

• Such guesses are problematic both because activists tend to guess high and because, once reported, the numbers take on a life of their own.

• People lose track of the estimate's original source, but assume the number must be correct because it appears everywhere. (Lies, Damned Lies, and Statistics, by Joel Best).

3: Have the Data

Been Peer-Reviewed? • Have the data been peer-reviewed?

• This doesn’t mean the data is necessarily without errors, merely that it has no flaws obvious to the author’s peers.

• WARNING: Sadly, in the past few years, new information has been coming out undermining the integrity of peer-reviewed literature:

• “Kevin and I will keep them out somehow – even if we have to redefine what the peer-review literature is!” (Phil Jones – Climatic Research Unit – leaked e-mails). (Details here.)



3: Have the Data

Been Peer-

Reviewed?

• Always check to see if research was formally peer reviewed. If it was, you know that the data you'll be looking at are at least minimally reliable.

• And if it wasn't peer-reviewed, ask why. It may be that the research just wasn't interesting to enough people to warrant peer review. Or it could mean that the authors of the research knew it couldn’t stand up to such scrutiny.

4: How Were the

Data Collected?

• If the data come from a survey, were the respondents randomly selected according to proper procedures?

• Random sampling is an important element of polls, and is what made Gallup famous.

• A classic example is the 1937 United States’ presidential election.



4: How Were the

Data Collected

?

• Literary Digest: 1937. Sent questionnaires to 10 million people -- all chosen from their subscription list, asking who they planned to vote for: The incumbent, Franklin Delano Roosevelt, or his challenger, Alf Landon?

• Gallup queried 50,000 people. • Literary Digest predicted a Landon

victory at 3-to-2. Gallup predicted Roosevelt would be the winner, and also predicted the Literary Digest would get it wrong -- before they'd even conducted their poll.

4: How Were the

Data Collected

?

• The Literary Digest received 2.3 million responses, or only 23%.

• "Those who felt strongly about the outcome of the election were most likely to respond. That included a majority of those who wanted a change: the Landon supporters. Those who were happy with the incumbent [Roosevelt] were less likely to bother to respond“ (Mind on Statistics, by Jessica M. Utts, Robert F. Heckard).

• Gallup also sent 3,000 people from the same list used by the Digest, mailed them a postcard asking how they planned to vote, and from that predicted the results of the Literary Digest poll to within 1%.

4: How Were the

Data Collected

?• Another problem is "cherry-picking." • For example, looking at illnesses in areas

surrounding power lines it is all too easy for a lazy (or dishonest) researcher to draw the boundaries to include several extra cases of the illness in question and exclude many healthy individuals in the same area.

• When in doubt, plot the subjects of a study on map and look for yourself to see if the boundaries make sense.

4: How Were the

Data Collected?

Zbigniew Jaworowski, M.D., Ph.D., D.Sc.,(Chairman, Scientific Council of Central Laboratory for Radiological Protection ) March 2007

Actual data points collected showing distribution of particular substance over time in parts per million.

http://www.warwickhughes.com/icecore/



4: How Were the

Data Collected?


Circled areas show data points chosen to support the researchers’ pre-set conclusion.


5: Is The Correlatio

n Relevant?

• Researchers like to do something called a "regression," a process that compares one thing to another to see if they are statistically related. They will call such a relationship a "correlation." Always remember that a correlation DOES NOT NECESSARILY mean causation.




n Relevant?

• A study might find that an increase in the local birth rate was correlated with the annual migration of storks over the town. This does not mean that the storks brought the babies. Or that the babies brought the storks.

• Statisticians call this sort of thing a "spurious correlation," which is a fancy term for "total coincidence."


n Relevant?

• Professor Helen M. Walker measured the angle of women's feet in walking and found the angle was greater among older women.

• Conclusion: Women toe out more as they grow older.

• Reality: The older women grew up when a young lady was taught to toe out. Members of the younger group weren’t.

How to Lie with Statistics by Darrell Huff (pp. 96 - 97)

By 100 their toes are under their shoulders


n Relevant?

• Based on centuries of observation, people in New Hebrides noticed that people in good health usually had lice and sick people very often did not.

• Conclusion: Lice make you healthy. • Reality: Almost everybody had lice

most of the time. A fever, however, drives lice away.

• How to Lie with Statistics By Darrell Huff (pp. 98 - 99)

So stop scratching – you’ll scare them away


n Relevant?

• A study shows that babies who get bottled water are healthier.

• Conclusion: Bottled water is better for babies than tap water.

• Reality: Parents who give their babies bottled water are more affluent. They have the stability and wherewithal to offer good food, clothing, shelter, and amenities.

• Statistics Second Edition By David Freedman, Robert Pisani, Roger Purves, and Ani Adhikari(p. 137)

Have Jeeves give the baby his bottle, dear

Nourish Bottled water for babies.

6: Are the data in

context?• Be wary of data taken out of context. • While working on a Milwaukee paper,

Eric Meyer, a professional reporter, would call the sheriff’s department whenever it snowed heavily and ask how many fender-benders there had been.

• He would then write a story with an opening such as: “A fierce winter storm dumped 8 inches of snow on Milwaukee, snarled rush-hour traffic and caused 28 fender-benders on county freeways.”


http://newslink.org/meyer

http://newslink.org/meyer


6: Are the data in

context?• One sunny day Eric called to ask how

many fender-benders had occurred .• The answer was 48.• Eric comments: “[It] made me wonder

whether in the future we'd run stories saying, ‘A fierce winter snowstorm prevented 20 fender-benders on county freeways today.’ There may or may not have been more accidents per mile traveled in the snow, but clearly there were fewer accidents when it snowed than when it did not.”

6: Are the data in

context?6: Are the

data in context?

You Are Here

7: Are the definitions valid?

• Homeless statistics are notoriously difficult for several reasons.

• It is intrinsically difficult to count people with no fixed address.

• There is no set definition of “homeless.”

• Another problem arises when definitions change. A definition doesn’t have to be perfect to give us meaningful statistics, but it must be consistent from one study to the next.

8: Has there been a change in circumstanc

e?• In the late 1970s in Toronto there was

a significant increase in prostitution arrests. This wasn’t the result of an influx of prostitutes, but of a change in the law concerning prostitution.

• The ability and willingness to report various crimes can result in an artificial increase in statistics. A dramatic increase in child-abuse statistics between 1950 and 2007 doesn’t necessarily mean there’s more child-abuse – merely that more children (and peripheral adults) are willing to report it.

9: Are the questions neutral?

• Wording is of great importance. • Significantly more people will support

“social assistance” than will support “welfare,” even when the two terms are being used in identical ways.

• “Do you support the attempt by the United States to bring freedom and democracy to other places in the world?”

• “Do you support the unprovoked military action by the United States?”

Bad Statistics: USA Today, by John M. Grohol, Psy.D. March 16, 2006

http://psychcentral.com/blog/archives/2006/03/16/bad-statistics-usa-today/

9: Are the questions neutral?

• Another way to do this is to precede the question by information that supports the "desired" answer.

• "Given the increasing burden of taxes on middle-class families, do you support cuts in income tax?"

• "Considering the rising federal budget deficit and the desperate need for more revenue, do you support cuts in income tax?"

10: Observatio

n or computer model?

• The enormous power of computers has led to the relatively new, and rapidly increasing field of computer models.

• By inputting all the relevant information, it is possible for a computer to run simulations that, for all intents and purposes, duplicate real-world experiments.

• This can save money, time, and even lives.

10: Is it observatio

n or computer

modelling?• There are two problems:

– Lack of programming skills,– Lack of relevant information.

• A programmer can easily neglect a variable, or screw up an algorithm

• Likewise, if all the variables are not know, or not well-understood, then the results can be useless.


n or computer

modelling?• Reported in the peer-reviewed science

journal Remote Sensing • Dr. Roy Spencer, a principal research

scientist at the University of Alabama compared NASA satellite data measuring the amount of heat escaping the Earth to the amount of heat loss predicted by the IPCC computer models.

• Spencer found that the NASA satellite data reveal more heat is escaping into space than IPCC computer models have predicted.

NASA Data Pit Scientific Method Against Climate Astrology

http://news.yahoo.com/nasa-data-pit-scientific-method-against-climate-astrology-165012546.html


n or computer

modelling?• The result, of course, was that the

IPCC immediately re-calibrated its models to reflect the real-world data.

• Ha ha! Just kidding.• What actually happened was that

IPCC champions chastised the media for reporting Dr. Spencer’s study, and then called into question the satellite data (data upon which the IPCC relies heavily for other purposes).

Five Statistical Concepts1. The Mean2. The Median3. Percent Change4. Percent Increase5. Per Capita Rate6. Margin of Error

1: The Mean

• This is one of the more common statistics you will see.

• To compute a mean, add up all the values in a set of data and then divide that sum by the number of values in the dataset.

• The chart shows how it works. But notice, only three of the nine workers at WWW Co. make the mean or more, while the other six workers don't make even half of it.

• Another way to look at average salaries is the Median.

Employee Wage

CEO $100,000

Manager $50,000

Manager $50,000

Factory Worker $15,000




Trainee $9,000

Trainee $9,000

Total $278,000

Mean $30,889



2: The Median

• The median is the exact middle. You basically line the numbers up by value, and then find the middle number

• In this case, that would be one of the factory workers with $15,000

Employee Wage

CEO $100,000

Manager $50,000

Manager $50,000





Trainee $9,000

Trainee $9,000

Total $278,000

Median $15,000



3: Per Cent Change

• Percent changes are useful to help people understand changes in a value over time.

• Simply subtract the old value from the new value, then divide by the old value. Multiply the result by 100 and slap a % sign on it. That's your percent change.

• If a class has 20 students last year, and 25 students this year, by what percent has the class grown?

• 25-20=5• 5/20=.25• .25X100=25%



3: Per Cent Change

• Springfield had 50 murders last year, as did Capital City. On the face of it, the crime rate is the same for both cities

• Five years ago, Springfield had 29 murders, Capital City had 42 murders.

• Capital City: (50-42)/42 X 100 = 19% increase

• Springfield: (50-29)/29 X 100 = 72.4% increase

• So Springfield has obviously had a far greater increase in crime.

• Or has it? There’s also another concept to consider: the per capita rate.

4: Per Capita Rate

• Five murders in one place can be far worse than 100 murders in another. The difference has to do with the rate of murder per capita.”

• Generally, per capita is measured in terms of 100,000.

• Divide number of murders by population, then multiply by 100,000

• Both Springfield and Capital City had 50 murders,, but as we saw, there was a vast increase in Springfield’s percentage of murders.

• But how do the number of murders compared to their respective populations?



4: Per Capita Rate

• Springfield has 800,000 people; five years ago it had 450,000

• Capital City has 600,000 people; five years ago it had 550,000.

• Springfield today: (50/800,000)X100,000 = 6.25 murders per 100,000.

• Springfield 5 years ago: (29/450,000)X100,000 = 6.44 murders per 100,000.

• Capital City today: (50/600,000)X100,000 = 8.33 murders per 100,000

• Capital City 5 years ago: (42/550,000)X100,000 = 7.63 murders per 100,000

4: Per Capita Rate

• Percent Change: Subtract old value from new value, divide by the old value, and multiply by 100.

• Springfield: (6.25 - 6.44)/6.44 X 100 = (2.9)% [Brackets indicate a negative number.]

• Capital City: (8.33 – 7.63)/7.63 X 100 = 9.17%

• So Springfield has had a decline in it’s murder rate of 2.9%, while Capital City has had an increase of 9.17%.

5: Margin of Error

• You’ll notice that polls done by reputable agencies come with a disclaimer that generally reads something like: “Margin of error plus or minus 4 percentage points, 95 times out 100.”

• This means that if you repeated this poll 100 times, then 95 times it would come out within plus or minus 4 percentage points of the number reported.

• This is a very important concept to remember during election polls, and one which the newspapers most commonly ignore.



5: Margin of Error

• Consider the pre-election polls for the incumbant, Mayor Quimby and his opponent, Sideshow Bob, :

• A poll on March 22 shows Quimby with 54% and Sideshow Bob with 46%.

• A poll on April 2 shows Quimby with 50% and Sideshow Bob with 50%.

• Although most newspapers would report that Quimby’s support has slipped, this is not really the case – the difference between the two polls is within the margin of error.

• A poll conducted the next day may show Quimby with 58% and Sideshow Bob with 42%, and it would still not mean that there had been any change.

5: Margin of Error

• And don’t forget that whole “95 times out of 100” thing. This means that for every 20 times you run the poll, you’ll get one result completely different from the norm

That’s all folks!And remember – mad

scientists need love too.

Redefining “peer-review”

• Eric Steig was a member of Michael Mann’s “Hockey Team.”

• 2009, Steig et al published a paper in Nature. The paper purported to counter the claim by sceptics that Antarctica is as cold now as it was 30 years ago. Steig’s paper showed that contrary to earlier claims, Antarctica was in fact warming too.

• Jeff Id and Ryan O’Donnell pointed out that the statistical methods used to show this alleged warming were based on highly dubious extrapolations of data taken from small number of stations on the Arctic peninsula and coastline.

• Steig suggested that O’Donnell, Id at el should publish a paper under peer review.

• They tried.• One of the people selected to review their

paper was known as "Reviewer A."

• “Reviewer A” tried to thwart the paper’s progression to publication with 88 pages of comments and obfuscation ten times longer than the original paper.

• Who was "Reviewer A"? Eric Steig. • Ryan guessed that Reviewer A was Stieg early on,

but remained patient and good natured. • Reviewer A (Steig) suggested that O'Donnell and Id

use an alternative statistical technique, which they did. Then Reviewer A (Steig) criticised the paper on the statistical technique he had suggested.

• That was when Ryan decided to bring all this out in the open.

• It was to this that Jones referred when he said, "Kevin [Trenberth]* and I will keep them out somehow – even if we have to redefine what the peer-review literature is!”

• * Trenberth has an honourable mention in the CRU emails for his famous statement regarding their inability to explain why the world wasn't heating up the way they'd predicted: "The fact is that we can’t account for the lack of warming at the moment and it is a travesty that we can’t."

reading studies & statistics

Documents

original research

original study

results of study

study vancouver

reports of reports

questions neutral

trilnumber of children

perceived offences