histogram - python class room diary · a histogram is graphical representation of the distribution...

12
A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum value It was first introduced by Karl Pearson. Basically, histograms are used to represent data given in form of some groups. X-axis is about bin ranges where Y-axis talks about frequency. It is a kind of bar graph. It is a kind of bar graph. Basically, histograms are used to represent data given in form of some groups. X-axis is about bin ranges where Y-axis talks about frequency. So, if you want to represent age wise population in form of graph then histogram suits well as it tells you how many exists in certain group range or bin, if you talk in context of histograms. HISTOGRAM What is Bin?? : divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size.

Upload: others

Post on 27-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum value

It was first introduced by Karl Pearson.

Basically, histograms are used to represent data given in form of some groups. X-axis is about bin ranges where Y-axis talks about frequency. It is a kind of bar graph.

It is a kind of bar graph.

Basically, histograms are used to represent data given in form of some groups.

X-axis is about bin ranges where Y-axis talks about frequency. So, if you want to

represent age wise population in form of graph then histogram suits well as it

tells you how many exists in certain group range or bin, if you talk in context of

histograms.

HISTOGRAM

What is Bin?? : divide the entire range of values into a series of intervals — and then

count how many values fall into each interval.

The bins are usually specified as consecutive, non-overlapping intervals of a

variable. The bins (intervals) must be adjacent, and are often (but are not required

to be) of equal size.

Page 2: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

This is the data for the histogram to the right, using 500 items:

BASIS HISTOGRAM BAR GRAPH

What is …? It refers to a graphical representation, that displays data by way of bars to show the

Bar graph is a pictorial representation of data that uses bars to compare different

Indicates Distribution of non-discrete variables

Comparison of discrete variables

Presents Quantitative data Categorical data

Spaces Bars are close to each other (no space in between)

Bars are not very close to touch each other( there is a space between bars).

Data Data(Values) are grouped together, so that they are considered as ranges.

Data (Values) are taken as individual entities.

Reordering of bar Possible?

No Yes

Width of bars

May differ Same

Bin Count

0 to 100 10

100 to 200 15

200 to 300 21

300 to 400 45

400 to 500 35

500 to 600 14

2.5 to 3.49 23

Page 3: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

Step 1: Collect the data for the histogram

For example, let’s say that you have the following data about the no. of books written by 100 individual authors:

NO of Books Written

01,01,02,03,03,05,07,08,09,10,10,11,11,13,13,15,16,17,18,18,18,19,

20,21,21,23,24,24,25,25,25,25,26,26,26,27,27,27,27,27,29,30,30,31,

33,34,34,34,35,36,36,37,37,38,38,39,40,41,41,42,43,44,45,45,46,47,

48,48,49,50,51,52,53,54,55,55,56,57,58,60,61,63,64,65,66,68,70,71,

72,74,75,77, 81,83,84,87,89, 90,90,91

Step 2: Determine the number of bins

Now, determine the number of bins to be used for the histogram.

For simplicity, let’s set the number of bins to 10.

Step 3: Plot the histogram in Python using matplotlib

PLOTTING HISTOGRAM

USING MATPLOTLIB

Page 4: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

OUTPUT

Page 5: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

Another Method to Calculate Bin:

Based on this information, the frequency table would look like this:

Intervals (bins) Frequency

0-9 9

10-19 13

20-29 19

30-39 15

40-49 13

50-59 10

60-69 7

70-79 6

80-89 5

90–99 3

Note that

The starting point for the first interval is 0, which is very close to the

minimum observation of 1 in our data-set. (If, for example, the minimum observation was 10 in another data- set, then the starting point for the first interval should be 10, rather than 0.)

Note :

bins in the Python code below, you’ll need to specify the values highlighted in RED, rather than a particular number (such as 10, which we used before). We must include the last value of 99.

Other Method to calculate bin Using formulas:

N = number of observations = 100

Range = max value – min value = 91 – 1 = 90

No. of intervals = √n = √100 = 10 Width of intervals = Range / (No. of intervals) = 90/10 = 9

Page 6: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

OUTPUT

Note that, the histogram is similar to the one we made before.

Page 7: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

In this example we are changing color of bars of histogram

Page 8: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

Changing Title,X-axis and Y-axis Labels

Page 9: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

In this example we are generating random series to create

histogram

Page 10: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

For example frequencies are given as

Frequency Cumulative frequency

How Cumulative frequency is calculated

1 1 0+1=1

3 4 1+3=4

6 10 4+6=10

2 12 10+2=12

8 20 12+8=20

3 23 20+3=23

5 28 23+5=28

2 30 28+2=30

2 32 30+2=32

4 36 32+4=36

5 41 36+5=41

6 47 41+6=47

8 55 47+8=55

In the next example we are going to create Cumulative

histogram but before that let’s understand how

cumulative frequency is calculated

1,3,6,2,8,3,5,2,2,4,5,6,8

(The cumulative frequency is calculated by adding

each frequency from a frequency distribution table to the

sum of its predecessors.)

Page 11: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

Lets create two histogram with (i) normal frequency

and (ii) with cumulative frequency with same data

1. With Normal frequency

Page 12: HISTOGRAM - Python Class Room Diary · A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum

2. With Cumulative frequency