chapter 2 data collectionchapter 2 data collectionmduan/stat3411/ch2.pdf · stratified random...

32
Chapter 2 Data Collection Chapter 2 Data Collection

Upload: others

Post on 03-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Chapter 2 Data CollectionChapter 2 Data Collection

Page 2: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Before any data are collected, you need to carefully define the question and developcarefully define the question and develop operational definitions!

Explicitly define the scope of the inferences p y pincluding limitations.

Page 3: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Stopping distances of bike ‐ smooth vs tread tires– On asphalt? dry? wet?

Which brands of smooth tires? Which type of– Which brands of smooth tires? Which type of brake?

Page 4: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

• There is a tradeoff between more precise answers to narrower questions or less precise q panswers to more general questions.

Page 5: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.1 General Principles in the Collection of Engineering DataEngineering Data

2.1.1 Measurement

"An engineer planning a study ought to ensure that data on relevant variables will be collected by well‐trained people using measurementby well trained people using measurement equipment of known and adequate quality."

"Training technicians has to be taken seriously."g y

Page 6: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Bi i t ti l i t ti l t b• Biases, intentional or unintentional, are to be avoided.

• Measurements can be made blind without personnel knowing what condition is being tested.– Medical experiments often have patients and doctors

blind to medication given.

• Other techniques for ensuring fair play (such as q g p y (randomization, blocking) are discussed later.

Page 7: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.1.3 Recording

Develop

• Documented protocolsDocumented protocols

• Recording formsInclude documentation explicitly on the recording forms

‐ Ambient temperature, unusual events

Put documentation into permanent‐ Put documentation into permanent computer data base 'meta‐data'

Page 8: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2 2 Sampling in Enumerative Studies2.2 Sampling in Enumerative Studies

Simple random sample

Put all part numbers in a box.Put all part numbers in a box.

In Excel Part # Random #

• Sort by random #Sort by random #.

• Pick first n rows for sampled parts.

Page 9: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Stratified random sample

Split parts to inventory into strataSplit parts to inventory into strata– Big expense parts

S ll– Small expense parts

Stratification assures adequate sampling of subcategories and potentially more precisionsubcategories and potentially more precision estimates

Page 10: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Advantage of random samplingAdvantage of random sampling

• Assumes objectivity

• Insurance against biases intentional or• Insurance against biases, intentional or unintentional

• Allows quantification of potential error via probabilityprobability

Page 11: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.3 Principles for Effective Experimentation2 3 1 Taxonomy of Variables2.3.1 Taxonomy of Variables

R i bl S t t t f i t tResponse variable ‐ System output of interest‐ Compression strength of taconite‐ Strength of glued boards

Managed variable ‐ Set by experiments.Experimental variable Set at different levels‐ Experimental variable ‐ Set at different levels

‐ Three levels of temperature for gluing‐ Controlled variable

‐ Use 3 glues but all at the same temperature

Freezing effect on glue bondFreezing effect on glue bond‐ Experimental variable ‐ freezing temperature‐ Controlled variable ‐ drying time, wood type, drying temp

Page 12: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2 3 2 Handling Extraneous Variables2.3.2 Handling Extraneous Variables

i bl i h i flAn extraneous variable is one that can influence the response but is not of primary interest.

– Stopping times of bicycles with treaded and smooth tires. The particular rider affects stopping times.

– Strength of glued wood. The moisture content of the wood can affect the

strength

Page 13: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Sometimes the extraneous variable is observed, like rider, and sometimes it's unobserved, like moisture. Sometimes the extraneous variable is even unanticipatedextraneous variable is even unanticipated.

Page 14: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Inattention to extraneous variables can add noise to the comparisons or confuse (confound) the experimental results.– We are interested in comparing types of golf clubs. If

we use golf balls of various condition, the variability due to golf ball conditions makes it harder to measure effects precisely, adds noise to the system. Other extraneous variables include golfer temperatureextraneous variables include golfer, temperature, wind speed, golfer fatigue, etc.

– If the glue 1 is set on a humid day and glue 2 is set onIf the glue 1 is set on a humid day and glue 2 is set on a dry day, observed differences could be due to glue type or humidity effects. Here glue and humidity effects are completely confounded, confused with each other.

Page 15: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Strategies for reducing effects of extraneous variablesextraneous variables

– Controlling variables

– Blockingg

– Randomization

Page 16: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Controlling a variable means keeping it at the same level.– Glue all boards at a nearly fixed temperature.

Have one rider for all runs of smooth and treaded– Have one rider for all runs of smooth and treaded tires.

U lf b ll f th t– Use new golf balls of the same type.

Page 17: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

A block of experimental units, experimental times, experimental conditions, etc. is a phomogeneous group of experimental units within which different levels of primarywithin which different levels of primary experimental variables can be applied and compared in a relatively uniform environmentcompared in a relatively uniform environment.

Page 18: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Bl ki i i ThBlocking is a very important concept. There will be exam questions about this concept.

Have each rider use a treaded and smooth tire– Have each rider use a treaded and smooth tire bike. Each rider is a 'block' . A block with 2 treatment levels is a paired designtreatment levels is a paired design.

– For comparing 3 glues, take 10 boards and cut h b d i hi d U h leach board into thirds. Use each glue on one part

of each board. The boards are blocks.

– Most often each treatment is replicated once in each block

Page 19: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

• Randomization is insurance against biases that might otherwise occur. – Each rider will ride bikes twice, once treaded and ,

once smooth tire. We don't want all smooth tired runs done first. We could randomize (flip a coin) to decide.

– If we have 30 small boards for gluing • We could randomly assign 10 boards to each glue.

A completely randomized design.If th b i diff b t th• If there are some obvious differences between the boards, it may help to divide the boards into 10 blocks of 3 boards. Within each block assign on board to each gglue. A randomized block design.

Page 20: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

– The order of gluing the 30 boards would also be randomized (and possibly blocked) to guard against having one glue done earlier in the day.

• Blocking often provides better insurance.

• Unblocked randomizing can end up with more of one g pglue earlier in the day.

Page 21: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

! Blocks are set up Before units are assigned to treatments.

• If we hit 10 golf balls with a titanium driver, these 10 balls are not a blockthese 10 balls are not a block.

• This is common mistake by students on exams.

Page 22: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

B h d i i d bl ki lik iBoth randomization and blocking are like insurance policies. I t h i th i ’t h t• In some cases not having the insurance won’t hurt.

• Other times not having the insurance can hurt big time.h h l f d d ll• The cost, hassle of randomization and potentially

blocking isn’t very big. U ll d i ti i th th t– Usually randomization is worth the cost.

– Infrequently randomization is not worth the cost.• But think carefully about whether there are potential pitfalls toBut think carefully about whether there are potential pitfalls to

not randomizing.• Bouncing balls on wood and cement surfaces.

Page 23: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.3.3 Comparative StudyA comparative study compares treatments, for example

comparing 2 glues. • Even when investigating a particular new treatment, it's best

to do a comparative study with the old glue. • If we only use the new glue on a batch of boards andIf we only use the new glue on a batch of boards and

compare the strengths to historical board strengths, it could be that the new boards are different from the historical boards Any observed difference could be due to glueboards. Any observed difference could be due to glue effects or due to changes in the boards.

In medical studies it's standard to include some patients who receive the old drug or no drug for a head to head comparison with the new drug The patients getting no drugcomparison with the new drug. The patients getting no drug are a 'control' group. This is another use of the term 'control'

Page 24: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.3.4 Replication

• Replication means carrying through the whole process of adjusting values for the supervised p j g pvariables, making an experimental 'run', and observing the results of that run – more thanobserving the results of that run more than once.

Page 25: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

"Si l i i l i d• "Simply re‐measuring an experimental unit does not amount to real replication." Or not resetting the entire process means not having true replicates Seeentire process means not having true replicates. See example 9, page 45.

• Example 10: Making one of each of 2 designs of paper planes and retesting the 2 planes does not accomplishplanes and retesting the 2 planes does not accomplish independent replications of the designs. If we only make 2 planes, we don’t know if the two planes more different than we would find by making 2 planes from the same design.

Page 26: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.4. Some Common Experimental Plans2 4 1 Completely Randomized Designs2.4.1 Completely Randomized Designs

I l t l d i d d i ll it t• In a completely randomized design all units or runs are put into a simple hat and randomly assigned to each treatment.

• Number the 30 boards. Pick 10 numbers for each (boards) for each glue.

P t th b d b 1 30 i l 1 f E l– Put the board numbers 1‐30 in column 1 of Excel. – Put random numbers into column 2.– Sort by the random column 2.y– Assign the board numbers in

• row 1‐10 to glue 1• rows 11‐20 to glue 2rows 11 20 to glue 2• rows 21‐30 to glue 3

Page 27: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.4.2. Randomized Complete Block Design

U it b k i t h f ll h• Units are broken into hopefully homogeneous blocks, and treatments are randomized to units within each blockwithin each block.

– Form 10 sets of 3 similar boards in each set (block)– Form 10 sets of 3 similar boards in each set (block).– Within each set (block) assign 1 board randomly to

each glueg

• Most commonly each treatment is replicated y ponce in each bock. Example 12 is unusual in this regard.

Page 28: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

2.5 Preparing to Collect Engineering DataData

Read the book.

Problem Definition

• Step 1: Identify the problem.

• Step 2: Understand the context of theStep 2: Understand the context of the problem.

• Step 3: State in precise terms the objective and scope of the study.

Page 29: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

S d fi i iStudy Definition• Step 4: Identify the response variables(s) and

appropriate instrumentation.• Step 5: Identify possible factors influencing p y p g

responses.• Step 6: Decide how (and if so how) to manageStep 6: Decide how (and if so how) to manage

factors likely to affect the responses.• Step 7: Develop a detailed data collectionStep 7: Develop a detailed data collection

protocol and time table for the first phase.

Page 30: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Ph i l P iPhysical Preparation• Step 8: Assign responsibility for careful supervision.• Step 9: Identify technicians and provide necessary

instruction in objectives and methods.d ll f d/• Step 10: Prepare data collection forms and/or

equipment.St 11 D d f l i fi titi d t• Step 11: Do a dry run of analysis on fictitious data.

• Step 12: Write up a 'best guess' prediction of results.

See the text for more details.

Page 31: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Some Study Questions

• What advantages does an experimental study have compared to an observational study?

• What is the difference between a population and a sample?

• Give an example of multivariate data.

• Managed variables are either experimental or controlled variables. What is a controlled variable?

• What is an extraneous variable?

• What are the 3 strategies for reducing effects of extraneous variables?variables?

Page 32: Chapter 2 Data CollectionChapter 2 Data Collectionmduan/stat3411/ch2.pdf · Stratified random sample ... Stratification assures adequate sampling of subcategories and potentially

Wh i “bl k”?• What is a “block”?

• Blocks are set up B_____ units are assigned to treatments. Fill in the blank.

• What is the potential advantage to the randomized bl k d i l t l d i d d i ?block design versus a completely randomized design?

• Give an example where 2 measurements are not separate, independent replicates.