Topic:Measures of Central Tendency and Variation

From SharedExperienceProject

Jump to: navigation, search

Contents

Topic Highlights

(What you will learn / gain)

  • What is meant by the term summary measure (or descriptive measure, or summary statistic)
  • Which summary measures are important for business statistics
  • A level of intuition about the measures of central tendency and variation
  • How to compute the standard measures for central tendency and variation

Introduction and Motivation

(Why learn it)

We make use of two major tools in descriptive statistics:

  1. Tables and graphs
  2. Summary measures

Until this point, we have been learning about descriptive statistics by considering the tables and graphs that can be used to best represent our data. In this topic, we will look for the first time at how data can be represented using single numbers called summary measures.

In business, it is absolutely critical that you know how to compute and interpret the summary measures we will study here.

Learning Activities

(How the levels of understanding will be gained)

Learning activities for this topic
Type Name Direction
Reading Self-directed
In-class worksheet
In-class discussion
Instructor-directed
Practice problems
Self-directed
Personal activities Self-directed

Learning Objectives

(Levels of understanding to be gained)

Learning objectives for this topic
Level of Understanding Objective(s)
Very best
Highly satisfactory
  • Have I developed a strong intuition about measures of central tendency and variation from the examples given in the lecture notes below? (Practice Problem 1 and Practice Problem 2 are good tests of this.)
  • Do I know what the disadvantages are of using summary measures over histograms?
Satisfactory
Maybe just enough to pass
  • Can I solve Example 4 that we saw in class?
  • Can I name the measures of central tendency?
  • Can I name the measures of variation?

Lecture Notes: Introduction to Summary Measures

The following notes are meant to facilitate a discussion about the concept of a summary measure.

What are summary measures?

A summary measure is a single number used to represent a data set. For example:

  • The mean of the height data is 174.1 cm
  • The range of the height data is 35.6 cm
  • The standard deviation of the height data is 9.8 cm

Why use them?

I have the frequency distribution tables and histograms all figured out. Why would I need to use a summary measure?

In the three examples above, we were able to describe the whole height data set using single numbers, and each tells us something different about the data itself.

This can be very powerful because of their simplicity, because the numbers are standard (everyone knows what you mean when you use them), and because the standard measures have been chosen to represent some useful information about the data.

As we will see late, using summary measures also a bit dangerous if you don't know how to interpret them.

What kinds of measures are there?

There are four particularly important measures::

1. Measures of central tendency:

  • Such measures give us a summary of the middle of the data
  • The most intuitive of these is the mean (or average) of the data, but there are other kinds that we will see shortly
  • We will study these below under Measure of Central Tendency

2. Measures of variation:

  • Such measures give us a summary of the variation in our data set: How are they spread out? How volatile are they?
  • The most intuitive of these is the range (maximum minus minimum), but there are other important measures of variation that we will also take a look at
  • We will study these below under Measures of Variation

3. Measures of position and measures of shape

There's no better way to really understand the concept of summary measures than to take a look at some. So, let's do it...

Lecture Notes: Measures of Central Tendency

The following notes are meant to facilitate a discussion about measure of central tendency.

Let's look first at the concept of a measure of central tendency. As mentioned above, such a measure is used to try and describe the center, or middle of your data set.

The most common measures for doing this are:

  1. Mean
  2. Median
  3. Midrange
  4. Mode

In the two sections that follow, we'll look how to think about these intuitively, and how to compute them numerically.

Gaining some intuition

As usual, let's start with some examples.

Example 1

The figure below contains a typical data set.


Would you be confident telling your manager that you can locate the "center" of the data? By only looking at the raw data, it's pretty tough to do. Further, can you imagine trying to find the center if the data set contained 1000 points?

To gain some intuition though, it might help if we look at this 11 point data set in the way that's done in the figure below.


Now where would you put the center?


Now, let's see where some of the summary measures actually lie.


How well do these represent the middle? Not bad, and they're pretty close to each other.

Example 2

The measures don't always match each other and they don't always match the true center of the data. For example, let's see what happens if we take away the last data point (5.9)? The dot plot now looks as shown below.


Now where would you put the center?


Let's see where some of the summary measures actually lie this time. How well does mode represent the center?


Example 3

What if we add a last point back again, but with a value of 8.4?


See how things are changed by an outlier like 8.4? Now where would you put the center?


Let's see where some of the measures lie now. How well does midrange represent the center? You're probably thinking that it depends on how the center is defined. And you are right, which is one reason why summary measures can be tough to interpret.

You may also have reached the conclusion that it depends on the data itself, which is also correct.

With this bit of intution in hand, we'll look at the measures of central tendancy in more detail in the next section (including the median, which we haven't yet looked at).


So, how do I compute the measures of central tendancy?

Midrange:

  • Computing the midrange should pretty intuitive for you
  • It's done by taking the average of the lowest and highest values in the data set:
Image:Midrange_equation.png

Mean:


  • Computing the mean should be pretty easy for you - everyone's done this at some time
  • It's just the average of the data, computed as the sum of the data points divided by the number of points:
Image:Mean_equation.png

Mode:

  • Mode is the most common value in the data set
  • Just look at the data, count how many of each value you have, and select the data point that shows up the most frequently.
  • Tricky circumstances:
    • If no value occurs more than once, then there is no mode
    • If two values occur as frequenctly as each other and more frequently than any other, then there are two modes (in the same way, there could also be more than two modes)

Median:

  • Median is the value in the middle of the data set, when the data points are arranged from smallest to largest
  • If there is an odd number of data points, then just arrange them and look for the middle value
  • Tricky circumstances:
    • If there is an even number of data points, you will need to take the average of the two middle values

Example 4

a) Compute the mean of the data we saw earlier in Example 2:

Image:Central_tendancy_data_-_only_9.PNG


b) What is the mode?


c) What is the median?


d) What is the midrange?


Example 5

Using the data from Example 1, compute the median:

Image:Central_tendancy_data.PNG


Example 6

What is the mode of the following data?

Image:Central_tendancy_data_2.PNG


Example 7

What is the mode of the following data?

Image:Central_tendancy_data_3.PNG


Lecture Notes: Measures of Variation

The following notes are meant to facilitate a discussion about measures of variation.

As mentioned above, measures of variation provide use with a summary of how much the points in our data set vary, e.g. how spread out they are or how volatile they are.

The most common measures for doing this are:

  1. Range
  2. Standard deviation
  3. Variance (which is closely related to standard deviation)
  4. The Coefficient of Variation

In the following sections, we'll gain some intuition about what these measures of variation mean, and we'll look at how they are computed.

Gaining some intuition

Again, let's start with an example.

Example 8

Which of the following data sets has a larger variation?


Example 9

Which of the following data sets has a larger variation?


Example 10

Which of the following data sets has a larger variation?


Example 11

Which of the following data sets has a larger variation?


So, how do I compute the measures of variation?

The measures of variation are computed as follows.

Range

  • Range is the simplest of the summary measures of variation
  • It is also the crudest and most prone to error
  • It is computed as the difference between the largest and the smallest value in a data set:
Image:Range_equation.png

Variance and standard deviation

  • Variance and standard deviation are the most common of all of the measures of variation
  • They describe the variation of the data around the sample mean
  • A smaller value implies a smaller variation from the mean
  • You are encouraged to use your calculator to obtain variance and standard deviation
  • Otherwise:
    • Variance is given by the following equation:
Image:Sample_variance_equation.png
  • The following is an easier-to-use equation for variance, which we will explain in more detail later:
Image:Sample_variance_equation_2.png
  • Standard deviation is the square root of variance:
Image:Sample_stdev_equation.png

The Coefficient of Variance

  • The Coefficient of Variance is a measure of variation expressed as a percentage the sample mean:
Image:Coeff_of_var_equation.png

Example 12

You are given the following data:

Image:Variance_data.PNG

Compute the sample range


Example 13

For the data given in Example 12, compute the sample variance and standard deviation using the equations given above (this is not the recommended method for an exam because it takes too long and there are too many places for you to make mistakes).

For the variance, use the following form of the equation:

Image:Sample_variance_equation_2.png

And carry out these steps to implement it:

  1. Write a column containing x
  2. Write a column containing x2
  3. Sum both columns
  4. Plug the values into the above equation



Example 14

Now use your calculator to compute the mean, standard deviation and variance for the data in Example 12:

Image:Variance_data.PNG

The steps are as follows using the BA II Plus:

1. Turn on the calculator by pressing [ON|OFF]
2. Open the data-entry worksheet and clear any existing data:
  • Press [2ND] [DATA]
  • Clear the worksheet by pressing [2nd] [CLR WORK]

3. Enter the data points:

  • Key in the first data point X01, e.g. 3.
  • Press [ENTER]
  • Press the down arrow key twice to get to X02
  • Key in the second data point, e.g. 6
  • Press [ENTER]
  • Keep going this way until all the points are entered
  • Be careful not to make a mistake

3. Get the results:

  • Press [2ND] [STAT]
  • Press the down arrow key to check the right number of points was used, e.g. n = 4.00
  • Press it again to get the sample mean, x bar
  • Press it again to get the sample standard deviation, Sx


Example 15

Compute the coefficient of variance. Recall the following equation:

Image:Coeff_of_var_equation.png


Practice Problems

Practice Problem 1

In Example 4, we obtained four different measures: 3.0, 3.5, 3.4 and 2.7. If they're all measures of central tendancy, how can they be different?


Practice Problem 2

If you suspected that a recording error was made in collecting the following data, would the midrange or the median be a more appropriate measure of central tendancy? Why?

Image:Central_tendancy_data_4.PNG


Practice Problem 3

The prices of two stocks vary over a 12-month period as shown below.

a) Describe the variability of the stocks.

b) Which appears to be more stable? Why?

Image:Stock_data.PNG

Practice Problem 4

You are given the following stem-and-leaf diagrams:

Image:Variation_4.PNG

a) Recover the raw data for each

b) Compute the mean for each

c) Compute the standard deviation for each - use your calculator (not the equations)


Personal tools