Topic:Measures of Central Tendency and Variation
From SharedExperienceProject
Contents |
Topic Highlights
(What you will learn / gain)
- What is meant by the term summary measure (or descriptive measure, or summary statistic)
- Which summary measures are important for business statistics
- A level of intuition about the measures of central tendency and variation
- How to compute the standard measures for central tendency and variation
Introduction and Motivation
(Why learn it)
We make use of two major tools in descriptive statistics:
- Tables and graphs
- Summary measures
Until this point, we have been learning about descriptive statistics by considering the tables and graphs that can be used to best represent our data. In this topic, we will look for the first time at how data can be represented using single numbers called summary measures.
In business, it is absolutely critical that you know how to compute and interpret the summary measures we will study here.
Learning Activities
(How the levels of understanding will be gained)
| Type | Name | Direction |
| Reading |
| Self-directed |
| In-class worksheet | ||
| In-class discussion |
| Instructor-directed |
| Practice problems |
| Self-directed |
| Personal activities |
| Self-directed |
Learning Objectives
(Levels of understanding to be gained)
| Level of Understanding | Objective(s) |
| Very best |
|
| Highly satisfactory |
|
| Satisfactory |
|
| Maybe just enough to pass |
|
Lecture Notes: Introduction to Summary Measures
The following notes are meant to facilitate a discussion about the concept of a summary measure.
What are summary measures?
A summary measure is a single number used to represent a data set. For example:
- The mean of the height data is 174.1 cm
- The range of the height data is 35.6 cm
- The standard deviation of the height data is 9.8 cm
Why use them?
I have the frequency distribution tables and histograms all figured out. Why would I need to use a summary measure?
In the three examples above, we were able to describe the whole height data set using single numbers, and each tells us something different about the data itself.
This can be very powerful because of their simplicity, because the numbers are standard (everyone knows what you mean when you use them), and because the standard measures have been chosen to represent some useful information about the data.
As we will see late, using summary measures also a bit dangerous if you don't know how to interpret them.
What kinds of measures are there?
There are four particularly important measures::
1. Measures of central tendency:
- Such measures give us a summary of the middle of the data
- The most intuitive of these is the mean (or average) of the data, but there are other kinds that we will see shortly
- We will study these below under Measure of Central Tendency
2. Measures of variation:
- Such measures give us a summary of the variation in our data set: How are they spread out? How volatile are they?
- The most intuitive of these is the range (maximum minus minimum), but there are other important measures of variation that we will also take a look at
- We will study these below under Measures of Variation
3. Measures of position and measures of shape
- Let's leave these for a later topic
There's no better way to really understand the concept of summary measures than to take a look at some. So, let's do it...
Lecture Notes: Measures of Central Tendency
The following notes are meant to facilitate a discussion about measure of central tendency.
Let's look first at the concept of a measure of central tendency. As mentioned above, such a measure is used to try and describe the center, or middle of your data set.
The most common measures for doing this are:
- Mean
- Median
- Midrange
- Mode
In the two sections that follow, we'll look how to think about these intuitively, and how to compute them numerically.
Gaining some intuition
As usual, let's start with some examples.
Example 1
The figure below contains a typical data set.
| Typical data set |
|---|
| |
Would you be confident telling your manager that you can locate the "center" of the data? By only looking at the raw data, it's pretty tough to do. Further, can you imagine trying to find the center if the data set contained 1000 points?
To gain some intuition though, it might help if we look at this 11 point data set in the way that's done in the figure below.
| Data on a dot plot |
|---|
![]() |
Now where would you put the center?
| The center? |
|---|
![]() |
Now, let's see where some of the summary measures actually lie.
| Midrange, mode (most frequent), and mean |
|---|
![]() |
How well do these represent the middle? Not bad, and they're pretty close to each other.
Example 2
The measures don't always match each other and they don't always match the true center of the data. For example, let's see what happens if we take away the last data point (5.9)? The dot plot now looks as shown below.
| Data on a dot plot, without point 5.9 |
|---|
![]() |
Now where would you put the center?
| The center? |
|---|
![]() |
Let's see where some of the summary measures actually lie this time. How well does mode represent the center?
| Midrange, mode (most frequent), and mean |
|---|
![]() |
Example 3
What if we add a last point back again, but with a value of 8.4?
| Data on a dot plot, with the point 8.4 |
|---|
![]() |
See how things are changed by an outlier like 8.4? Now where would you put the center?
| The center? |
|---|
![]() |
Let's see where some of the measures lie now. How well does midrange represent the center? You're probably thinking that it depends on how the center is defined. And you are right, which is one reason why summary measures can be tough to interpret.
You may also have reached the conclusion that it depends on the data itself, which is also correct.
With this bit of intution in hand, we'll look at the measures of central tendancy in more detail in the next section (including the median, which we haven't yet looked at).
| Midrange, mode (most frequent), and mean |
|---|
![]() |
So, how do I compute the measures of central tendancy?
Midrange:
- Computing the midrange should pretty intuitive for you
- It's done by taking the average of the lowest and highest values in the data set:
Mean:
- Computing the mean should be pretty easy for you - everyone's done this at some time
- It's just the average of the data, computed as the sum of the data points divided by the number of points:
Mode:
- Mode is the most common value in the data set
- Just look at the data, count how many of each value you have, and select the data point that shows up the most frequently.
- Tricky circumstances:
- If no value occurs more than once, then there is no mode
- If two values occur as frequenctly as each other and more frequently than any other, then there are two modes (in the same way, there could also be more than two modes)
Median:
- Median is the value in the middle of the data set, when the data points are arranged from smallest to largest
- If there is an odd number of data points, then just arrange them and look for the middle value
- Tricky circumstances:
- If there is an even number of data points, you will need to take the average of the two middle values
Example 4
a) Compute the mean of the data we saw earlier in Example 2:
| Mean |
|---|
|
So, mean = ( 1.3 + 2.1 + 2.6 + 2.8 + 3.4 + 3.5 + 3.5 + 3.7 + 4.1 ) / 9 = 3.0
|
b) What is the mode?
| Mode |
|---|
|
c) What is the median?
| Median |
|---|
|
d) What is the midrange?
| Midrange |
|---|
|
Example 5
Using the data from Example 1, compute the median:
| Median |
|---|
median = ( 3.4 + 3.5 ) / 2 = 3.45 |
Example 6
What is the mode of the following data?
| Mode |
|---|
|
Example 7
What is the mode of the following data?
| Mode |
|---|
|
Lecture Notes: Measures of Variation
The following notes are meant to facilitate a discussion about measures of variation.
As mentioned above, measures of variation provide use with a summary of how much the points in our data set vary, e.g. how spread out they are or how volatile they are.
The most common measures for doing this are:
- Range
- Standard deviation
- Variance (which is closely related to standard deviation)
- The Coefficient of Variation
In the following sections, we'll gain some intuition about what these measures of variation mean, and we'll look at how they are computed.
Gaining some intuition
Again, let's start with an example.
Example 8
Which of the following data sets has a larger variation?
| Samples |
|---|
|
| Solution |
|---|
|
Example 9
Which of the following data sets has a larger variation?
| Samples |
|---|
![]() |
| Solution |
|---|
|
Example 10
Which of the following data sets has a larger variation?
| Samples |
|---|
| |
| Solution |
|---|
|
Example 11
Which of the following data sets has a larger variation?
| Samples |
|---|
| |
| Solution |
|---|
|
So, how do I compute the measures of variation?
The measures of variation are computed as follows.
Range
- Range is the simplest of the summary measures of variation
- It is also the crudest and most prone to error
- It is computed as the difference between the largest and the smallest value in a data set:
Variance and standard deviation
- Variance and standard deviation are the most common of all of the measures of variation
- They describe the variation of the data around the sample mean
- A smaller value implies a smaller variation from the mean
- You are encouraged to use your calculator to obtain variance and standard deviation
- Otherwise:
- Variance is given by the following equation:
- The following is an easier-to-use equation for variance, which we will explain in more detail later:
- Standard deviation is the square root of variance:
The Coefficient of Variance
- The Coefficient of Variance is a measure of variation expressed as a percentage the sample mean:
Example 12
You are given the following data:
Compute the sample range
| Range computation |
|---|
|
H = 11, L = 3 range = H - L = 11 - 3 = 8 |
Example 13
For the data given in Example 12, compute the sample variance and standard deviation using the equations given above (this is not the recommended method for an exam because it takes too long and there are too many places for you to make mistakes).
For the variance, use the following form of the equation:
And carry out these steps to implement it:
- Write a column containing x
- Write a column containing x2
- Sum both columns
- Plug the values into the above equation
| Variance computation |
|---|
sample variance = s2 = ( 247 - 292/4 ) / ( 3 ) = 12.25 |
| Standard deviation computation |
|---|
s = sqrt (12.25) = 3.5 |
Example 14
Now use your calculator to compute the mean, standard deviation and variance for the data in Example 12:
The steps are as follows using the BA II Plus:
1. Turn on the calculator by pressing [ON|OFF]
2. Open the data-entry worksheet and clear any existing data:
- Press [2ND] [DATA]
- Clear the worksheet by pressing [2nd] [CLR WORK]
3. Enter the data points:
- Key in the first data point X01, e.g. 3.
- Press [ENTER]
- Press the down arrow key twice to get to X02
- Key in the second data point, e.g. 6
- Press [ENTER]
- Keep going this way until all the points are entered
- Be careful not to make a mistake
3. Get the results:
- Press [2ND] [STAT]
- Press the down arrow key to check the right number of points was used, e.g. n = 4.00
- Press it again to get the sample mean, x bar
- Press it again to get the sample standard deviation, Sx
| Solution |
|---|
|
Example 15
Compute the coefficient of variance. Recall the following equation:
| Coefficient of variation computation |
|---|
|
We compute mean = ( 3 + 6 + 9 + 11) / 4 = 7.25 And we know s = 3.5 Therefore using the above equation we get: CV = 3.5 / 7.25 x 100 = 48.3 |
Practice Problems
Practice Problem 1
In Example 4, we obtained four different measures: 3.0, 3.5, 3.4 and 2.7. If they're all measures of central tendancy, how can they be different?
| Solution |
|---|
|
Practice Problem 2
If you suspected that a recording error was made in collecting the following data, would the midrange or the median be a more appropriate measure of central tendancy? Why?
| Solution |
|---|
|
Practice Problem 3
The prices of two stocks vary over a 12-month period as shown below.
a) Describe the variability of the stocks.
b) Which appears to be more stable? Why?
Practice Problem 4
You are given the following stem-and-leaf diagrams:
a) Recover the raw data for each
b) Compute the mean for each
c) Compute the standard deviation for each - use your calculator (not the equations)
| Solution |
|---|















