experiments & statistics. experiment design playtesting experiments don’t have to be...

17
Experiments & Statistics

Upload: kelly-powers

Post on 28-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Experiments & Statistics

Page 2: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Experiment Design•Playtesting

• Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design and conduct, and the results are obvious

•Two approaches:

• Measure a Quantity

• Test a Hypothesis

• (Can do both in the same experiment)

•Experiments are much weaker than proofs!

Page 3: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Control Group

•Establish a baseline

•Detect any outside factors that might influence the experiment

•e.g., location, testing process itself, temperature, day of week, recent events

Page 4: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Countering Bias•Your bias: Predict and then test against new

data, don’t just fit a theory to existing data

•Sample bias:

• Did you select playtesters who actually represent your target market?

• Is your experiment designed to reveal their true preferences? (beware of incenting them to “make you happy” or to seek outcomes that they don’t actually desire)

• Did you prevent them from “cheating”?

•Community bias: anonymous (blind) reviews

Page 5: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Measurement (and

Statistics)

Page 6: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Example: Measuring Time

•Play N turns of a game, measuring the time per turn

•We can now predict how long the game will run without further testing, even after we change the rules.

•(How large should N be?)

Page 7: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Accuracy vs. Precision

•Experiments estimate values; they are never exact

•Accuracy is how close your measurement is to the true value (significant digits)

•Precision is the number of decimal places in your measurement

Page 8: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Population vs. Sample

•Population statistics (truth):

•μ = Mean (“average” or “expected value”)

•σ = Standard deviation

•Sample statistics (measured):

•N = Number of samples

•m = Mean

•s = Sample deviationNote the n-1 where you

expected to see n

Page 9: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Is the Mean Accurate?

NN 95%95% 99%99%3 4.303 s 9.925 s

4 3.182 s 5.841 s

5 2.776 s 4.604 s

10 2.262 s 3.250 s

20 2.093 s 2.861 s

50 2.010 s 2.680 s

100 1.984 s 2.626 s

t distribution

Let N = sample sizeLet m = sample averageLet s2 = sample varianceAssume normal distribution

For N = 10, the true population mean is on the interval:

m ± s 3.250

with 99% probability.http://onlinestatbook.com/chapter8/mean.html

Page 10: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Exercise•Experimental Results*:

•Played N = 20 turns of Carcassonne

•Average turn time was m = 20 seconds

•Sample deviation was s = 1.9

•What range are you 95% confident contains the true mean?

•95% Confidence Interval: m ± 2.093 s

Conclusion: More than 95% confident that the true average turn time is between 16 and 24 seconds

Sample Times:1819202021182120232519182118172122191921

*Artificial Results to make computation easier

Page 11: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Extrapolation• We usually want to measure a relatively small

fraction of the population and then generalize, e.g., political polling data.

• Any Distribution: At least (1-1/k2)*100% of the values are within μ ± kσ. (Chebyshev’s Inequality)

• Normal Distribution: See table.

k Percent within μ ± kσ

Normal (=) Any Distribution (≥)

1 68% 0%

2 95% 75%3 99.7% 89%4 99.99% 94%6 99.999999% 97%

Page 12: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Is the Variance Accurate?

•The previous slide assumed that we knew the population variables μ and σ!

•We know how to tell if m is accurate...

•But is s accurate?

•Good question. In this class, we’ll just assume that it is...

Page 13: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Exercise• We estimated that for Carcassonne, the turn time was m =

20 with s = 1.9.

• There are 71 turns in the game.

• Assume turns times are normally distributed. How many turns per game do you expect to take more than 22 seconds?

• What is the range of total play times you expect for 99.9% of all games?

•68% within [18, 22]

•32% outside [18, 22]

•Half of the 32% are on the high side

•16% chance of one turn running long

•Conclusion: 71 turns * 16% ≈ 11 turns

•mgame = 71 * m = 71 * 20 seconds = 1,380 seconds = 23 minutes

•sgame2 = 71 * s2 = 71 * 1.92; sgame = 16 seconds

•Normal distribution, so 99.7% within 3 standard deviations (48 seconds)

•Conclusion: About 99.9% of games within 22 - 24 minutes.

Page 14: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Hypothesis Testing1.Form a hypothesis

2.Design an experiment to test

• Analyze the statistical validity of the test

3.Run the experiment

4.Evaluate results

5.(often...go back to step 1)

Page 15: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Objective and Quantitative•Bad!

“People played our game and said that it was fun, therefore it was engaging.”

•Better“On average, our game was 2nd in a ranking from `most fun’ to `least fun’ of ten other commercial games in a survey of 100 players. 20% of subjects rated our game #1”

•Good“100 subjects were randomly assigned to play our game or a hand-made version of Pit. They then decided individually which game to play again. 82% of respondents chose to play our game, so we conclude that it is about 4 times more engaging than Pit.”

Page 16: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Exercises

• “Our new rules increased engagement in the game.”

• “The chance of drawing an unplayable tile in Carcassonne is less than 0.1%.”

• “Experienced players usually choose the highest resource intersection first and then maximize resource distribution second in Settlers of Catan.”

• “In Guitar Hero, the intro for More Than a Feeling is harder than the chorus for most players.”

Design experiments to test the following hypotheses:

Page 17: Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design

Exercises

• “Our new rules increased engagement in the game.”

• “The chance of drawing an unplayable tile in Carcassonne is less than 0.1%.”

• “Experienced players usually choose the highest resource intersection first and then maximize resource distribution second in Settlers of Catan.”

• “In Guitar Hero, the intro for More Than a Feeling is harder than the chorus for most players.”

Design experiments to test the following hypotheses: