Topic:Measures of Position and Shape

From SharedExperienceProject

Jump to: navigation, search

Contents

Topic Highlights

(What you will learn)

  • Measures of position, including percentile, quartile and the z-score
  • How the box plot can be used to create a simple representation of the basic measures of position
  • Measures of shape, in particular skewness
  • Why these are important and how to create and interpret them

Introduction and Motivation

(Why learn it)

As you know by now, we make use of two major tools in descriptive statistics:

  1. Tables and graphs, and
  2. Summary measures

In the topic Measures of Central Tendancy and Variation, we looked for the first time at item 2: how data can be represented using single numbers called summary measures. In this topic, we will explore this concept further by looking at measures that describe the position and shape of the data.

Again, it is very important that you know how to compute and interpret the summary measures we will study here.

Learning Activities

(How the levels of understanding will be gained)

Learning activities for this topic
Type Name Direction
Reading Self-directed
In-class worksheet Self-directed
In-class discussion
Instructor-directed
Practice problems
Self-directed
Personal activities Self-directed

Learning Objectives

(Levels of understanding to be gained)

Learning objectives for this topic
Level of Understanding Objective(s)
Very best
Highly satisfactory
Satisfactory
Maybe just enough to pass

Lecture Notes: Measures of Position and the Use of Box Plots

These notes are intended to facilitate a discussion of the concept of measures of position.

Why use them?

Simply put, measures of position allow you to determine how a given data point stacks up against the others in the data set.

To use an example that might be relevant to you, the right measure of position would allow you to answer the question: Where did I place in the class on the last quiz, relative to my fellow students? In other words: What position was I in the class for that quiz?

We will see many business examples where measures of position are used, including the more common percentile and quartile measures.

Let's jump right in.

Percentiles

A percentile helps measure the position of a data point relative to all of the other points in the data set.

We would write the following, to give two examples:

  • The 35th percentile, also written P35, is the value such that at most 35% of the data values are less than P35 and at most 65% of the values are greater than P35
  • The 50th percentile, also written P50, is the value such that at most 50% of the data values are less than P50 and at most 50% of the values are greater than P50

Example 1

The students in a small class score the following on a quiz:

Image:Percentile_data_1.PNG

What is the 50th percentile?


Example 2

Now imagine the data set is increased by the result of one more student who gets a score of 5:

Image:Percentile_data_2.PNG

What is the 50th percentile now?


If you haven't already, you should notice that the 50th percentile is exactly the same thing as the median, which we studied last time in Measures of Central Tendancy and Variation.

In fact, finding percentiles is nothing more than finding the values corresponding to some portion of the data set:

  • For P50, that portion is 50%, a concept with which we are familiar because of having already learned about the median.
  • For P35 it is 35%. This means we want the value below which 35% of the other values lie.
  • If we say a student scored at P90, we are saying that they beat 90% of the class.


Steps for finding the value corresponding to a percentile

The 50th percentile is easy, but let's try to generalize the situation so we know how to find the position of data point corresponding to any percentile we might be given.

The following steps are used:

1. First compute the approximate location of the data point, using the following equation:

Image:Percentile_location_equation.png where P is the percentile and n is the size of the data set, i.e. the number of data points.

2. Then get the value from the data set according to the following rules:

If location turns out to be a non counting number, then round it up.
  • For example, if location = 20.4, then the value you seek is the 21st value in the data set (when it is ordered from lowest to highest)

Otherwise, if location is a counting number, then the value you seek is:
  • The average of the value at location and at (location+1):
  • For example, if location = 40, then the value you seek is the average of the 40th and 41st data points

You can imagine how important these steps are for large data sets - it becomes tough enough to eyeball the center, and it is next to impossible to reliably eyeball other percentiles.

Example 3

Use the steps above to find: a) The location of P50 for the data in Example 1 (repeated below), and b) P50 itself

Image:Percentile_data_1.PNG


Example 4

Use the steps above to find: a) the location of P50 for the data in Example 2 (repeated below), and b) P50 itself

Image:Percentile_data_2.PNG


Example 5

Just to be sure you've understood, use the same steps to find the value of P35 for the data in Example 2 (repeated below):

Image:Percentile_data_2.PNG


Quartiles

The concept of quartiles is very simple. They are the 25th, 50th and 75th percentiles, and they are typically of most interest:

  • Q1 = Quartile 1 = P25
  • Q2 = Quartile 2 = P50 = median
  • Q3 = Quartile 3 = P75

We also commonly refer to difference between Q3 and Q1 as the interquartile range (IQR):

IQR = Q3 - Q1

There's really not much more to it than that!

However, it is worth making sure you understand the concept.

Example 6

a) What percentage of the data lies above Q1?

b) Above Q3?

c) In the IQR?


z-Scores

Z-Scores are another measure of position with which you should be familiar. Simply put, it is used to express the value of a data point in terms of the number of standard deviations it is away from the mean.

The following equation is used to compute the z-score:

Image:Z_score_equation.png

where:

  • x is the value of the data point of interest
  • x bar is the sample mean of the data set
  • s is the sample standard deviation of the data set

Since it is relatively straightforward to compute this (as long as you know how to get Image:Xbar.png and s), we won't spend any more time on it here. Instead, be sure to check out the corresponding practice problem (Practice Problem 1).

Box Plots

A box plot is just a visual plot of the quartiles, and the high and low values in the data set.

Example 7

Create a box plot for the following situation:

  • L = 1
  • Q1 = 2
  • Q2 = 3.5
  • Q3 = 4
  • H = 5


Lecture Notes: Measures of Shape

The following notes are intended to facilitate a discussion about measures of shape.

What is skewness?

Skewness is a basic measure of shape that represents the symmetry of a data set. It is computed based on:

  • The sample mean, Image:Xbar.png
  • The sample median, Md
  • The sample standard deviation, s

The following equation is used for this:

Image:Skewness_equation.png

It will be assumed here that you can plug the values into this equation to get the right answer. (Try your hand at the practice problems to be sure.)

Instead, let's try to build some intuition.

Example 8

Which of the following has greater skewness?



Example 9

Would the value of Sk be greater than or less than zero for the following?



Describing the skewness

You can describe the skewness using two factors:

  1. Its direction
  2. Its magnitude

The rules for both are given below.

Direction

A data set is left skewed if Sk is negative, i.e. if Sk < 0

A data set is right skewed if Sk is positive, i.e. if Sk > 0

A data set is symmetric (or not skewed) if Sk = 0

Magnitude

The value of Sk you compute will typically vary from -3 to 3. You can qualitatively describe the level of skewness as follows for various levels of Sk:

Mild skewness if:
-1 <= Sk <= 1

Moderate skewness if:

-2 <= Sk < -1 or 1 < Sk <= 2

High skewness if:

-3 <= Sk < -2 or 2 < Sk < 3

Example 10

a) Describe the skewness if you have computed Sk = 0.5


b) Describe the skewness if you have computed Sk = -2.2

Practice Problems

We've covered the basics, now build your skills with the following problems. Don't look at the solutions until you've worked the problem through.

Practice Problem 1

For the following data, how many standard deviations away from the mean is data point with the largest value? (Use your calculator to practice computing the mean and standard deviation as efficiently as possible.)

Image:Percentile_data_2.PNG


Practice Problem 2

The following data were obtained from a survey requesting 30 different families to list their weekly expenditure on food.

Image:Data_for_practice_problem_2.PNG

a) Calculate the 20th percentile

b) Calculate the 80th percentile

c) Calculate the IQR

d) Calculate the mean, median, standard deviation and coefficient of skewness


Practice Problem 3

The manager of a small restaurant wished to determine how long the average customer had to wait to be served during the lunch hour. At the lunch hour on a particular typical day, 20 customers experienced the following waiting times (in minutes):

5.5, 10.3, 7.5, 8.1, 6.8, 11.0, 10.2, 9.0, 7.0, 5.8, 12.5, 7.5, 6.0, 13.7, 5.5, 14.0, 7.0, 6.9, 6.3, 7.4

a) Construct a box plot of the data

b) Should the manager feel comfortable in advertising that if meals are not surved in 10 minutes or less, the customer eats for free?

c) Are the data in this example skewed? If so, which way?


Practice Problem 4

In which percentile does a student find himself if he has scored the 1 among the following quiz scores?

Image:Percentile_data_3.PNG

(This is a tough problem.)


Personal tools