sampling and variability (chapter 5.1 - 5.4)
DESCRIPTION
Sampling and Variability (Chapter 5.1 - 5.4). Chengyuan Peng 92777A [email protected]. Purpose of Sampling. What is Data Population Problems with using all of the data The whole data not available Too much data Necessary to sample the data when building models Capture a Sample: - PowerPoint PPT PresentationTRANSCRIPT
Purpose of Sampling• What is Data Population
• Problems with using all of the data– The whole data not available– Too much data– Necessary to sample the data when building
models
• Capture a Sample:– To represent only some part of the population
Variability of Variables• Main Feature of a Variable
– Takes on a variety of values
– Contains Pattern distribution
• Numerical variables
• Categorical variables
• Graphical Display of a Pattern Distribution– Histogram, Curve
• Problems– Convergence: True Population Distribution Pattern
Unknown
– Measuring Variability: Which Distribution Curve is the Right one to use ????
Converging
• To Create a Distribution Curve for the Sample– Selecting instance values, one at a time at random
– Recalculated when adding a new instance value
• Converge– At first: a large change
– After a while: settled down -> Converges to the Final shape
• Summary– What is measured not the shape of the curve, but the
Variability of the sample
Measuring Variability
• Require Some Method of Measuring Variability– Without being sensitive to column width or smoothing method
• What is Variability– How far the individual instances from the Mean of the sample
• Standard Deviation --- One Popular Measure
- O n e F o r m u l a :
S t a n d a r d d e v i a t i o n ( ) ( )x m n2 1
- A n o t h e r F o r m u l a : I m p o r t a n t f o r d a t a p r e p a r a t i o n p r o c e s s
s = ( ) ( )x n m n2 2 1
• Why Confidence– An alternative of sampling the whole population
– To establish some acceptable degree of confidence,
• 95% as a satisfactory level of confidence
Variability of Numeric and Alpha Variables
• Distinction
– Alpha: for nominal / categorical; measured in nonnumeric scales
– Numeric: measured in numeric scales
– Different when measuring variability
• Measuring Variability of Numeric Variables– Covered above– Random sampling without introducing bias
• Measuring Variability of Alpha Variables– Instead of standard deviation
– Rate of Discovery (ROD):• Measure the rate of change of the relative proportion of values
discovered
• Sample size increases, the ROD of new alpha values falls