histogram - python class room diary · a histogram is graphical representation of the distribution...
TRANSCRIPT
A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum value
It was first introduced by Karl Pearson.
Basically, histograms are used to represent data given in form of some groups. X-axis is about bin ranges where Y-axis talks about frequency. It is a kind of bar graph.
It is a kind of bar graph.
Basically, histograms are used to represent data given in form of some groups.
X-axis is about bin ranges where Y-axis talks about frequency. So, if you want to
represent age wise population in form of graph then histogram suits well as it
tells you how many exists in certain group range or bin, if you talk in context of
histograms.
HISTOGRAM
What is Bin?? : divide the entire range of values into a series of intervals — and then
count how many values fall into each interval.
The bins are usually specified as consecutive, non-overlapping intervals of a
variable. The bins (intervals) must be adjacent, and are often (but are not required
to be) of equal size.
This is the data for the histogram to the right, using 500 items:
BASIS HISTOGRAM BAR GRAPH
What is …? It refers to a graphical representation, that displays data by way of bars to show the
Bar graph is a pictorial representation of data that uses bars to compare different
Indicates Distribution of non-discrete variables
Comparison of discrete variables
Presents Quantitative data Categorical data
Spaces Bars are close to each other (no space in between)
Bars are not very close to touch each other( there is a space between bars).
Data Data(Values) are grouped together, so that they are considered as ranges.
Data (Values) are taken as individual entities.
Reordering of bar Possible?
No Yes
Width of bars
May differ Same
Bin Count
0 to 100 10
100 to 200 15
200 to 300 21
300 to 400 45
400 to 500 35
500 to 600 14
2.5 to 3.49 23
Step 1: Collect the data for the histogram
For example, let’s say that you have the following data about the no. of books written by 100 individual authors:
NO of Books Written
01,01,02,03,03,05,07,08,09,10,10,11,11,13,13,15,16,17,18,18,18,19,
20,21,21,23,24,24,25,25,25,25,26,26,26,27,27,27,27,27,29,30,30,31,
33,34,34,34,35,36,36,37,37,38,38,39,40,41,41,42,43,44,45,45,46,47,
48,48,49,50,51,52,53,54,55,55,56,57,58,60,61,63,64,65,66,68,70,71,
72,74,75,77, 81,83,84,87,89, 90,90,91
Step 2: Determine the number of bins
Now, determine the number of bins to be used for the histogram.
For simplicity, let’s set the number of bins to 10.
Step 3: Plot the histogram in Python using matplotlib
PLOTTING HISTOGRAM
USING MATPLOTLIB
OUTPUT
Another Method to Calculate Bin:
Based on this information, the frequency table would look like this:
Intervals (bins) Frequency
0-9 9
10-19 13
20-29 19
30-39 15
40-49 13
50-59 10
60-69 7
70-79 6
80-89 5
90–99 3
Note that
The starting point for the first interval is 0, which is very close to the
minimum observation of 1 in our data-set. (If, for example, the minimum observation was 10 in another data- set, then the starting point for the first interval should be 10, rather than 0.)
Note :
bins in the Python code below, you’ll need to specify the values highlighted in RED, rather than a particular number (such as 10, which we used before). We must include the last value of 99.
Other Method to calculate bin Using formulas:
N = number of observations = 100
Range = max value – min value = 91 – 1 = 90
No. of intervals = √n = √100 = 10 Width of intervals = Range / (No. of intervals) = 90/10 = 9
OUTPUT
Note that, the histogram is similar to the one we made before.
In this example we are changing color of bars of histogram
Changing Title,X-axis and Y-axis Labels
In this example we are generating random series to create
histogram
For example frequencies are given as
Frequency Cumulative frequency
How Cumulative frequency is calculated
1 1 0+1=1
3 4 1+3=4
6 10 4+6=10
2 12 10+2=12
8 20 12+8=20
3 23 20+3=23
5 28 23+5=28
2 30 28+2=30
2 32 30+2=32
4 36 32+4=36
5 41 36+5=41
6 47 41+6=47
8 55 47+8=55
In the next example we are going to create Cumulative
histogram but before that let’s understand how
cumulative frequency is calculated
1,3,6,2,8,3,5,2,2,4,5,6,8
(The cumulative frequency is calculated by adding
each frequency from a frequency distribution table to the
sum of its predecessors.)
Lets create two histogram with (i) normal frequency
and (ii) with cumulative frequency with same data
1. With Normal frequency
2. With Cumulative frequency