Topic:Frequency Distributions and Histograms
From SharedExperienceProject
Contents |
Topic Highlights
(What you will learn)
- Representing data using frequency distributions
- Representing data using histograms
- Direct and relative distributions
- Why these are important and how to interpret them
Introduction and Motivation
(Why learn it)
In this topic, we are still firmly planted in the realm of descriptive statistics (see Introduction to Business Statistics if you need to refresh yourself about what this means). We will take our first steps toward being able to provide useful tabular and graphical descriptions of quantitative data.
A frequency distribution is a simple way to represent large amounts of data so they are more easily interpreted. A histogram is just a picture of a frequency distribution.
These are basic analysis tools, and everyone in business needs to know how to create and interpret them.
Learning Activities
(How the levels of understanding will be gained)
| Type | Name | Direction |
| Reading |
| Self-directed |
| In-class worksheet |
| Self-directed |
| In-class discussion |
| Instructor-directed |
| Personal activities |
| Self-directed |
Learning Objectives
(Levels of understanding to be gained)
| Level of Understanding | Objective(s) |
| Very best |
|
| Highly satisfactory |
|
| Satisfactory |
|
| Maybe just enough to pass |
Lecture Notes: Frequency Distributions and Histograms
Why use them?
So what's the deal? Why would I want to bother with a frequency distribution or a histogram anyway?
Let's answer that with an example. Consider the data given in the figure below, which are taken from the "Who's in the Room?
" cards filled in by a recent statistics class. (Recall the topic Variables in Data Sets, where you were first introduced to those cards.)
| Raw data collected by sampling a recent statistics class |
|---|
| |
You should observe two things right away:
- There are a lot of data points there, even though only 34 cards were filled in and the data from 7 of those were not used for various reasons
- It's tough to learn much from the data just by looking at them:
- It's true that we could do some basic calculations (such as mean and standard deviation) and some basic measurements (such as the number of males and females in the class, or the proportion that has blue eyes), but it shouldn't be hard for you to imagine how much more difficult even this would become if the sample got any larger.
100 people would take a while. 1000 would be out of the question!
So, what can we do? We can start by using frequency distributions and histograms to summarize all of the data in a way that can be more easily interpreted.
If you buy this, then the challenge is to figure out what these are, how to create them, and how to interpret them. Read on...
What are they?
Introductory Example
A frequency distribution is just a table that represents the raw data by summarizing how much of it is in some pre-defined groups, or classes.
Let's consider some examples, starting with something simple:
The first figure below shows the sibling data taken from the entire "Who's in the Room" data set we looked at above[1]:
- It shows all 27 data points about respondents' siblings
- Make sure you understand from which column of the original data this came
| Raw sibling data |
|---|
| |
The second figure shows a frequency distribution table based on this data:
- It represents the 27 data points using only five classes: 0, 1, 2, 3 and 4 siblings
- It says there are 1, 9, 11, 4 and 2 data points in each of theses classes, respectively:
- 1 respondent with 0 siblings
- 9 with 1 sibling
- 11 with 3 siblings
- 4 with 3 siblings
- 2 with 4 siblings
| Frequency distribution table for the sibling data |
|---|
| |
The third figure shows the corresponding frequency histogram:
- It shows exactly the same information as the frequency distribution table above
- However, it does so in a visually pleasing way that is much more useful for interpreting the data
| Frequency histogram for the sibling data |
|---|
| |
So, we've gone from an entire data set (in the first figure), to two representations of the sibling data from which we could make much more meaningful decisions. For example, a marketer wouldn't want to promote a product for only-children to the class - there is only one of them!
Before moving on, be aware of the following:
- For data sets with between 15 and 40 points, you will need to be able to create a frequency histogram from scratch using a pencil, paper and calculator
- The situation will get a little more complex than this simple case, e.g.:
- Classes will not usually contain just one number, as they did in this simple example where we used 0, 1, 2, 3 and 4 siblings; they will contain a range of numbers
- Continuous data need to be treated slightly differently
Let's take a look at how to make a simple frequency distribution table and histogram...
How do I create a frequency distribution table?
The following notes are meant to facilitate a discussion about the concept of a frequency distribution table.Introduction to Business Statistics
Basic Steps
Once you have collected some raw data, the following steps are used to create a frequency histogram by hand.
First, do some required calculations:
1. Order the raw data by writing it out again from smallest to largest
2. Select the number of classes you want to use - usually 5-10
3. Compute the width of the classes based on the largest and smallest value in your data set
Then, create the frequency distribution table by:
4. Writing the appropriate column headings
5. Writing the class numbers in the first column
6. Determining the limits for each class, and writing them in the table
7. Counting the number of data points in each class (their frequency), and writing them in the table
8. Double-checking your solution
Example 1
Let's work through an example using the continuous data given below. This is the height data that was collected using the "Who's in the Room?
" cards. Be sure you can see which column this came from by looking at the first figure on this page.
In this example, your goal is to create a frequency distribution table and histogram for the height data, using 5 classes.
| Raw height data |
|---|
| |
Look at the basic steps given above. The first step is to order the raw data from smallest to largest. Do it now on a sheet of paper:
- We do this to make the following steps easier
- It is useful to count and make sure you have the same number of points as the raw data (27 in this case)
- You should end up with the following
| Ordered height data |
|---|
| |
The second step is to select the number of classes. We were already told to use 5 classes, so let's move on.
The third step is to compute the class width. This is done using the following equation:
where:
- CW is the class width we are looking for
- H is the highest value in your data
- L is the lowest value in your data
- K is the number of classes, which we just said is 5
Compute CW now using your calculator. Show your work on your sheet of paper. Your response should look like the answer given in the figure below.
| Class width calculation |
|---|
|
Given: H = 190.5, L = 154.9 and K = 5 Need: CW CW = (H - L) / K = (190.5 - 154.9) / 5 = 7.12 Which after rounding gives a class width, CW = 7 |
The next two steps are to set up the table. The fourth is to write the appropriate column headings: Class Number, Class, and Frequency. The fifth is to simply write the class numbers in the first column. Do this on your sheet of paper.
The sixth step is to write the limits for each class. Do this on your sheet of paper too:
- Since the lowest data point is 154.9, a good starting point for the first class is 154
- The upper limit of the first class is 154 + 7 = 161
- Because this is continuous data, be sure to use the correct notation, i.e. "154 and under 161"
Now, your solution should look something like the one shown below. All that's missing are the frequency data!
| Empty frequency distribution table (updated) |
|---|
| |
The last step is to go back to the ordered data and count how many data points lie in each class. Do this now and put the data in your table.
Also compute the total count, allowing you to double-check whether you've made a mistake. It should correspond to the number of points: 27 in this case.
You have finished your first frequency distribution table! It should look like the one below.
| Completed frequency distribution table (updated) |
|---|
| |
Before moving on, consider the following:
- We were asked to use 5 classes, but ended up with 6. Why is this, and is it okay?
- Instead of being given K = 5, we could have been given CW directly. For example, the problem may have said to use a class width of 5. Or 10. Don't let this stump you - it actually saves you some time.
- We were very explicit here to show our work. I recommend this on a quiz or exam because you may get part marks. If you had just shown the final table but had made mistakes in calculating your class width, then it would be harded for the marker to know how and where you made the mistake. Help the marker help you.
How do I create the corresponding histogram?
The following notes are meant to facilitate an introductory discussion about the concepts of frequency distributions and histograms.Introduction to Business Statistics
The histogram is just a plot of the frequency distribution table. As we've seen, it's a bar chart where the height of the bars represents the frequency of data points in each class.
Example 2
Let's expand on Example 1 by creating the histogram corresponding to our final frequency distribution table. Do this now on your sheet of paper.
Make sure to:
- Have one bar for each class
- Label both axes correctly, using the same labels you used in the table
- Use appropriate scales for both axes
- Leave no gaps between the bars
You should end up with something like the one shown in the figure below.
| Frequency histogram for the height data |
|---|
| |
Practice Problems 1
We've covered the basics, now build your skills with the following problems. Don't look at the solutions until you've worked the problem through.
Practice Problem 1
a) Repeat Example 1 using the raw height data given in the figure below for the females in the class. Use the same classes as we did in Example 1.
b) Do it again using the data for the males. Use the same classes here too.
| Raw height data for the females and males |
|---|
| |
| Solution |
|---|
| |
Practice Problem 2
a) Create the histograms for each of the tables from Problem 1
b) What do you observe? Does it make sense to you intuitively?
| Solution |
|---|
| |
Practice Problem 3
The percentage increase in the property tax of 26 randomly selected homes in a certain subdivision are as follows:
| 5.10 | 7.35 | 13.34 | 18.19 | 9.12 |
| 9.89 | 10.45 | 12.89 | 17.91 | 0.51 |
| 3.42 | 8.34 | 11.12 | 14.51 | 7.25 |
| 12.35 | 11.89 | 14.10 | 29.10 | 14.91 |
| 11.89 | 17.89 | 15.30 | 26.10 | 19.80 |
| 18.45 |
a) Construct a relative frequency distribution table with six classes
b) From the relative frequency distribution, determine what proportion of homes had an increase of more than 15% in property tax
c) What interpretation can you give to the shape of the distribution?
The solution to this problem is posted on Blackboard.
Practice Problem 4
Put the data below into a frequency distribution table. Use a class width of 2. Correctly title and label the table.
| 2.0 | 3.0 | 3.5 | 4.5 | 4.5 | 4.5 | 5.0 | 5.0 | 5.5 | 5.5 |
| 5.5 | 5.5 | 6.0 | 6.5 | 6.5 | 8.0 | 8.0 | 8.0 | 9.0 | 9.0 |
| 9.5 | 9.5 | 10 | 10.5 | 11 | 11.5 | 11.5 | 11.5 | 12.0 | 12.0 |
| Solution |
|---|
Lecture Notes: Relative Frequencies
There is another additional concept of which you should be aware regarding frequency distributions: the relative frequency.
What and why?
Relative frequencies are simple. They answer the question of what proportion of the data falls into each class. This provides another useful way of looking at the data, and a quick way to compute proportions related to the data set.
How do I compute it?
Relative frequency is computed as the ratio of the frequency for that class to the total count:
You just do this for each class in your table.
Let's look at an example.
Example 3
Compute the relative frequencies for the frequency distribution table we saw in the Introductory Example.
Your answer should look like the one in the figure below.
| Relative frequency for the sibling data |
|---|
| |
Example 4
What proportion of people in the class has 2 siblings or less?
What percentage has three siblings or more?
Can I create a relative frequency histogram?
Yes! In exactly the same way as we did for the frequency histogram.
Example 5
Create the relative frequency histogram for the data computed in Example 3.
Your solution should look like the one below.
| Relative frequency histogram for the sibling data |
|---|
| |
Practice Problems 2
Now you've seen the relative frequency, build your skills with the following problems. Don't look at the solutions until you've worked the problem through.
Practice Problem 4
Add the relative frequency column to the frequency distribution table you created in Example 1 for the height data.
| Solution |
|---|
| |
Practice Problem 5
a) Create the corresponding histogram.
b) How is it different from the one in Example 2?
| Solution |
|---|
| |
Practice Problem 6
a) What proportion of students is equal to or taller than 168 cm, but less than 189 cm?
b) If your assistant handed you only the relative frequency histogram, would you be able to tell what proportion of students is under 171 cm tall?
| Solution |
|---|
|
a) 0.19 + 0.30 + 0.22 = 0.71 b) No. You would need the original data in order to do this, since you only know the data in ranges which do not border 171. |
References
- ↑ You should recognize the sibling data to be of type discrete-ratio (recall the topic Variables in Data Sets)

