msc unit 1

Biostatistics Text Book : An Introduction to Biostatistics by N.Gurumani Reference Book: (1) Biostatistical Analysis by Jerrold H. Zar. (2) Biostatistics : A foundation for analysis in the health sciences by W.W.Daniel. Chapter 1 Introduction History Brave (1554-1661) - Astronomy Kepler - Detailed study of the information collected by Brave. John Graunt (1620-1674) - Father of vital statistics (births and deaths). Edmund Halley - First life table. Sussimilch (1707-1767) - Ratio of births and deaths remains constant. Bernoulli (1654-1705) law of large numbers Laplace (1749-1827) Expectation Francis Galton (1822-1921) - Study of regression analysis. Jevons (1835-1882) - index numbers. Karl Pearson (1857-1936) - Correlation analysis. A.Fisher (1890-1962) - tests of significance applied to genetics. The word STATISTICS Status Latin Statista Italia Statistique French Statistic German Literal meaning : political state (administrative activities of the state)

Definitions Statistics are numerical statements of facts in any department of inquiry BOWLEY (plural) The science which deals with collection, presentation, analysis and interpretation of numerical data CROXTON and COWDEN ( singular) Science of counting / averages Example : statistical methods Question : Find the percentage of marks obtained by Mr. Kumar in his S.S.L.C examination. Collection of data: Marks 75 60 100 99 51 Organization of data

Subject marks

Tamil 75

English 60

Maths 100

Science 99

Social science 51

Presentation of data Analysis of data Percentage of marks = (secured marks / total marks) X 100

0

20

4060

80100

120

tamil english maths science social science

= ( 425 / 500 ) X 100 = 85% Interpretation of data Mr. Kumar has got 85% of marks in his SSLC examination. Four functions of the statistics Collection of data Tabulation and Presentation of data Analysis of data Interpretation of data Limitations of statistics It can be used only to study numerically valued data not qualitative phenomena like intelligence, poverty, honesty etc. It deals with aggregate and not with individuals. Statistical data collected for a given purpose cannot be applied to any other situation. It is not always possible to compare statistical data unless they are homogeneous in character. It can be misused. It is one of the methods of studying a problem. Biostatistics - Defn The statistics which is used to analyze the data derived from the biological sciences & medicine is named as biostatistics. Type of statistics Descriptive Statistics: To organize and summarize the data is known as Descriptive statistics. [ Descriptive Statistics - methods of organizing, summarizing, and presenting data in an informative way. ] Inferential Statistics: To reach decisions about a large body of data by examining only a small part of data. [Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a sample. ]

A Taxonomy of Statistics

Chapter 2 Population and Sample Population, sample, sampling Population: The largest collection of values of a random variable for which we have an interest of a particular time. Finite: If a population of values consists of a fixed number of these values, then the population is said to be Finite. Infinite: If a population consists of an endless succession of values, then the population is said to be Infinite Sample A part of a population or A subset of a population. Sampling The process of drawing a sample from a population is called sampling.

Population versus Sample A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a portion, or part, of the population of interest

Chapter 3 Variables Variables and Variate Variable Defn : Commonly a factor or character which can take different values is called a variable. Example: height, length, weight. Variate Defn : It is a single observation of a variable. -- Types of Variables Qualitative : A variable, cannot be expressed in numbers is known as qualitative variable. ( ex- sex, skin colour, smell of flowers) Quantitative: A quantitative variable is one whose differing status can be expressed in numbers. ( length of fish, weight of frog, seeds in a fruit)

A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a portion, or part, of the population of interest

Type of quantitative variables A quantitative variable can be classified as discrete or continuous. Discrete: It can assume certain fixed numerical values with no intermediate values possible. ( ex- number of seeds in a fruit; number of children per family) Continuous: It can assume, at least theoretically, infinite number of values b/w any two fixed points. ( ex- length of fish, area, volume, percentages, etc) Types of Variables

3. Ranked Variable: The variables, cannot be measured but can be ordered or ranked by their magnitude. ( ex- Activity of gland ( pancreas, thyroid, blood pressure, etc) as assessed in microscopic observations) 4. Derived variable: The variables are those which are calculated based on two or more independently measured variables. (to show the relationship b/w variables) ( ex- ratio, percentage, indices) Measurement The assignment of numbers to objects or events according to a set of rules. (i) Ratio scale : Measurement scales having a constant interval size and true zero point are said to be ratio scales of measurement. Ex: length, weight, volume (ii) Interval scale: The scale which is used to order the objects (measurements, observations) and also the distance between any two measurements is known as interval scale. (iii) Ordinal scale: The scale which is used to order (arrange) the measures are adding to same criterion but not category to category is known as ordinal scale.

Example: low - medium high (iv) Nominal scale: The lowest measurement scale is the nominal scale. It consists of naming observations or classifying them into various mutually exclusive and collective exhaustive categories. Example: child - adult under 65 - 65 male female The Characteristics for Levels of Measurement

Chapter 4. Collection of Data Collection of data Data:The raw material of statistics is data. Types of Data: (i) Primary data: The data which are collected from the individual respondents directly for the purpose of certain study or information are known as primary data. (ex- survey, experiments, questionnaire, local agents ). ii) Secondary data: The data which had been collected by certain people or agency or statistically analyzed records are known as secondary data. (ex- published reports, commercially available data banks, journals, census reports ) Various steps involved in collection of data Statement of the hypothesis Nature of the sample Sample size Enumeration and Measurement Scientific Notation Significant Digits

Rounding of data Errors in Measurements Accuracy and Precision of data Recording of the data Statement of the hypothesis Situation/ problem / question a)To study the relationship b/w the various factors affecting the growth of catla (fish) keep pH of water is constant. b)To study effect of pH on growth of catla. c)Various factors: age, sex, maturity, physico-chemical properties of the water, type and quantity of food, what catla, protein content of flesh. d)Data can be drawn based on the type of experiment. The clear statement of the problem should include Statement of the hypothesis The clear statement of the problem should include The hypothesis ( is a tentative statement that offers an answer / explanation for a problem) Precise definition of the population from which the samples is to be obtained and on which inferences are to be made (It includes sex , age groups, maturity, seasonal conditions, environmental factors) Definition of the parameters to be measured Methodology ( It includes design of experiments, measurement of the parameters, units of measurements, instrumentation to be used for measurement) Nature of the Sample Sample should be unbiased Random sample ( The unbiased sample is obtained from the population randomly) Issues related to random samples Obtaining a satisfactory random sample is not easy. Collecting the samples of plants, plant parts, air , water might be easier than samples of animals. Sample size Size affects the accuracy of the inference. Larger samples are more useful than the smaller samples. Small unbiased samples are more useful than large biased samples. Enumeration and Measurement In case of discrete variables, the variate is a value of counting or enumeration. In case of continuous variables, a variate is a value of measurement (scale). Examples: a) Discrete variates: (i) Smaller values are recorded accurately (Number of girls in a family; 0,1,2,3) (ii) larger numbers are recorded approximately (Number of cells, insects, bacteria, RBC in one mm3 of blood) b) Continuous variates : Measurement variates are obtained from continuous variables are recorded approximatly. Scientific Notation Scientific Notation : Power of 10 Suppose the number has many zeros before and after the decimal point, we will employ the scientific notation for our convenience.

Example: 679000000 = 6.79 x 10 8 0.0078 = 7.8 x 10 -3

Significant digits The accurate digits , apart from zeros needed to locate the decimal point, are called significant digits or significant figures of the number. Example: 76.3 (3 significant digits) 8.6700 (5 s.d) 28.65 (4 s.d) 0.05723 (6 s.d) Rounding of Data To reduce the number of significant digits while recording measurement variates, we follow the general rules for rounding A digit to be rounded of is not changed if it is followed by a digit less than 5. . is increased by 1 if it is followed by a digit greater than 5. is an even number and is followed 5 standing alone or followed by zeros, then the number is unchanged. If the number is an odd number then it is raised by 1. Errors in measurement Two types of errors : Systematic & Random Systematic : The errors due to defective instrument Random error: The difference between observed value and true value Accuracy and Precision of data Accuracy : It is the closeness of a measured or computed value to its true value.( If the systematic error on the higher side, then the accuracy of the method and therefore that of the data is low. Precision : It is the closeness of the repeated measurements of the same quantity (It is a measure of reproducibility). Recording of the data A complete and permanent record of the data obtained should be maintained (Notebook, record sheets, PC, etc) The record should include a) date b) number or code c)weight / size d) Location e) brief comments of observations f) units of measurements g) Derived variables if any. Chapter 5. Classification and Tabulation of Data Classification Defn: It is the process of arranging the available facts into homogeneous groups or classes to bring out the resemblances, similarities and other relationships. Objectives: The mass data into a concise format. To bring out the relevant points similarity, dissimilarity, and comparison. To make the statistical treatment of the data easy. Characteristics of Classification Unambiguity: Clear definition of the terms used should be provided. Stability: Consistent throughout the analysis

Flexibility: Easy to manipulate to new situations and circumstances ( addition/deletion of few classes without altering the basic theme) Types of Classification Spatial or Geographical: It is based on geographical locations( different continents, countries, states, towns,) Temporal of chronological: It is based on time ( year, month, ) Qualitative: It is based on quality or attribute ( colour, behaviour, religion ..) Quantitative: It is based on enumerable or measurable variable. Tabulation Defn: It is defined as presentation of classified data in scientific manner to bring out the essential features and main characteristics. Organisation of a Table Table number : reference and future identification Title of the table: nature of the data, collected and classified details, other relevant details Date Head note ( optional ): Information like unit of measurements and scientific notations Captions: headings of the vertical columns Stubs : heading of the horizontal rows Body of the table : cells, numerical values, totals, statistical analytical values, derived values , Source Footnote ( optional ): Explanations to the information given in the various parts of the table.

Classification vs Tabulation Classification Tabulation

1. It is the process of dividing the data into homogeneous subgroups

2. sorting

It is the process of arranging the classified data systematically / scientifically in rows and columns of a table. Summarizing

3. This condenses the mass of data and facilitates to grasp the nature.

This provides the data a readily referable and almost permanent form.( rows and columns)

4. This foreruns tabulation This completes an important stage of enumeration.

5. This is a process of analysis of data.

This is a process of presentation of data.

6. Careful planning for tabulation is necessary even at this stage

This is a mechanical function after classification

General rules for the construction of table Number and Title Neither too large nor too small Units of measurement Large numbers to be approximated ( scientific notation) Spaces between rows for long unbroken columns The values to be compared should be kept in adjacent columns / rows Label the columns with numbers/alphabets. The column headings should be used in the continuation pages Items in stub should be in logical order ( alphabetic, chronological, geographical,..) All less important items are placed in a separate column named as Miscellaneous classes Rulings ( border / grid lines ) are important. Thick / Multiple lines are used in the main classes while thinner lines are used to separate the sub-classes. Types of Tables Qualitative and Quantitative: sample classified according to some qualitative / quantitative characteristic is tabulated as qualitative / quantitative Simple and Complex : Based on number of variables (a) one way (b) two way (c) three way (d) manifold Primary and derivative : Primary table is prepared on the basis of the original data collected. Derivative table is prepared on the basis of the statistical derivatives such as ratio, percentage, index, etc

Chapter 6. Diagrams and Graphs Need and Usefulness To present dry and uninteresting statistical facts in the form of attractive and appealing pictures and graphs They render comparisons simple Forecasting To understand the relationship between variables To locate descriptive statistical measures (Median, Mode etc.) Guidelines for drawing diagrams and Graphs Choose the diagram or Graph for the data appropriately Number and title Scale is neither too large nor too small Geometric instruments are required Software packages can be used a) Microsoft Excel b) SPSS (Statistics Package for the Social Science ) Types of Diagrams Bar diagrams a) Simple Bar b) Multiple Bar c) Sub divided or Component Bar d) Percentage Bar Pie Diagrams Pictograms and Cartograms Bar Diagrams Simple Bar: It is used to represent only one variable. In this diagram, the base are of same width and only the length varies. Multiple Bar: It is used to represent more than one variable with multiple bars. In this diagram, the base are of same width and only the length varies. The bars are drawn side by side. Different colours or shades are used to distinguish the bars. Subdivided Bar: It is used to represent more than one variable. A bar is subdivided in to parts in proportion to the values given in the data drawn on absolute figures Percentage Bar: It is used to represent more than one variable. A bar is subdivided in to parts in proportion to the values given in the data drawn on percentages (Relative basis) value in percentage == Individual value of the item X 100 / Total value of the item Pie Diagrams Pie Diagram: It is circular diagram. The circle is divided in to segments which are in proportion to the size of component. Different colors or patterns are used to differentiate the segments Conversion of the angles corresponding to each factor by applying the formula Angle of the sector == Individual value of the item X 360o / Total value of the item Pictograms and Cartograms Pictograms represent the data in the form of pictures. They are attractive, easy to understand and useful for exhibition, poster session of seminars, magazines and newspapers.

Cartograms are used to present the statistical data in the form of maps. (Reports of temperature, rainfall in the different parts, state) Excel Graphs Graphs are charts consisting of points, lines and curves. They are drawn on graph sheets. Scales are to be chosen suitably in both X and Y axes. Statistical measures like quartiles, median and mode can be found out from the graphs. It is also used to analysis the time series, regression, forecasting, interpolation. Graph vs Diagram Graphs Diagrams It consists of points, lines and curves This is drawn on graph sheet Numerical variation is in two directions and scales are chosen for both the axes Mathematical relation b/w two variables is shown by regression lines Less attractive and requires more attention to understand Widely used in statistical analysis, presentation of data and research. Trends and tendencies are known

It is geometrical shape such as bar, circle, etc

Graph sheet is not required Numerical variation is in one

direction and scales are chosen for only one axis

Useful for visual comparisons More attractive and easier to

understand the nature of the data It is used in advertisements and

publicities Trends and tendencies of the data

are not known

Chapter 7. Frequency Distribution Frequency Distribution It is a classification of a random variable into a number of classes or class intervals indicating the number of times( ie the frequency) the different classes or representatives of the class intervals occur in the data. It is always presented in a table called FT. Discrete frequency distribution Classes: Depends up on maximum and minimum value of the data Tally marks: Counting process Frequency: The number of items that fall in the class Continuous frequency distribution Practical class intervals: Used for counting purpose ( 3.3 3.5; 3.6-3.8; 3.9-4.1) True class intervals: Used for all other purposes. Implied limits of the original values. They are continuous because no break b/w the classes( 3.25-3.55; 3.55-3.85;3.85-4.15) Class limits / boundaries: Each (practical / true) class-interval consists of two limits called lower limit and upper limit( 3.3-lower 3.5-upper; 3.25-lower 3.55-upper)

Width / Magnitude of a class interval: Difference b/w the true upper limit and true lower limit Mid-point / Mid-value / Class mark of a class-interval: The average of the true lower and true upper limits. Class frequency: The number of items that fall in the class-interval Number of class-interval: Depends up on the total number of item. Steps in framing a frequency distribution Classes should be clearly defined and should lead to any ambiguity Each of the values should be included in one of the classes Mutually exclusive classes Classes should be of equal width Classes should be non-overlapping Avoid open-end classes Number of classes ; b/w 5 and 15 Class interval = (max min ) / number of classes , where the number of classes = 1 + 3.322 log10 N Cumulative frequency distributions It is a frequency distribution when successive frequencies are added together so that each class includes all the classes below or above, depending upon the end from which the cumulative process begins. Less than: The accumulation begins from the first class-interval More than: The accumulation begins from the last class-interval Relative frequency distribution Divide the frequency of each class / class interval by total number of item. Percent relative frequency: multiply relative frequency by 100 Frequency Graphs Qualitative and Discrete frequency distribution: Bar diagram Continuous frequency distribution: Histogram, Frequency polygon, Frequency curve. Cumulative frequency distribution: ogives curve Bar diagram ( graph) X-axis: classes of the variable Y-axis: Frequencies Height of each bar is proportional to the frequency of the respective class. Histogram X-axis : class intervals Y-axis: frequencies Construct adjacent rectangle for each class interval Height of each rectangle is proportional to the frequency of the respective class.

Frequency Polygon It is constructed from histogram by joining the midpoints of top of the various rectangles by straight lines upto the base line. The total area of the frequency polygon = total area of the rectangles taken together. Frequency curve It is constructed from histogram by joining the midpoints of top of the various rectangles by smooth lines upto the base line.

Cumulative frequency Graphs / OGIVES Construct the table of less than and more than cumulative frequency. Plot the points of less than c.f on y-axis and draw a smooth line/curve which is the less than ogives Plot the points of more than c.f on y-axis and draw a smooth line/curve which is the required more than ogives

msc unit 1

Documents