Topic:A First Look at the Central Limit Theorem

From SharedExperienceProject

Jump to: navigation, search

Contents

Topic Highlights

(What you will learn)

  • A review of random sampling
  • A review of finding the sample mean
  • What is meant by the term Central Limit Theorem
  • The distribution of the a random variable of sample means
  • Why the Central Limit Theorem is important

Introduction and Motivation

(Why learn it)

Way back in the topic Introduction to Business Statistics, we looked at the decision-making process and the difference between descriptive and inferential statistics. In doing so, we also looked at the difference between populations, samples, parameters and statistics. In this section, we will dig deeper into these topics by considering the sample mean in detail, along with its distribution. This will give rise to something called the Central Limit Theorem.

Although the relevance of all this may not seem obvious at first blush, we will see in subsequent chapters that it plays very important roles in the decision-making process. So, bear down and get through these basic topics.

Learning Activities

(How the levels of understanding will be gained)

Learning activities for this topic
Type Name Direction
Reading
Self-directed
Lecture and discussion
Instructor-directed
In-class worksheet Self-directed
Personal activities Self-directed

Learning Objectives

(Levels of understanding to be gained)

Learning objectives for this topic
Level of Understanding Objective(s)
Very best
Highly satisfactory
  • Can I solve Practice Problem 2?
  • Am I crystal clear on the concepts summarized under the heading Summary?
  • Do I understand the Central Limit Theorem?
  • Can I explain what is the random variable Image:Xbar_large.png?
  • Are the equations for Image:Mu_xbar.png and Image:Sigma_xbar.png on my equation sheet?
Satisfactory
Maybe just enough to pass


Topic Notes: Random Sampling and the Sample Mean (Again)

These notes are intended to facilitate an introductory discussion of the sampling distribution of the sample mean.

For understanding the Central Limit Theorem, it is useful to review the basic concepts of random sampling and the sample mean.

Review of Random Sampling and Sample Statistics

When we first did our Introduction to Business Statistics, we looked at the difference between a population and a sample. It can be boiled down to the following figure, which shows the target population - what we would like to know about - and the sample - what we actually know about if we make some measurements.

Image:Central_Limit_Theorem_1.png

We also looked at the difference between parameters (of the population) and statistics (of the sample):

Image:Central_Limit_Theorem_3.png

Finally, recall that we spent several topics reviewing how to actually compute the various sample statistics: Measures of Central Tendancy and Variation and Measures of Position and Shape. Especially with the midterm coming up, it would be wise to review these, including the calculator methods.

For the remainder of this topic, we will concentrate on the sample mean, which is computed using the following equation:

Image:Mean_equation.png

The following plot shows the concept of sample and the sample mean:

Image:Central_Limit_Theorem_2.png

Example 1

Compute the mean of the following data:

Image:Central_tendancy_data_-_only_9.PNG


Sampling More than Once

Have you ever thought about what would happen if you sampled a population more than once?

For example, sampling a population twice can be depicted as shown below:

Image:Central_Limit_Theorem_4.png

Example 2

Imagine you conduct a random poll 100 to ask the ages of people living in your quadrant of the city. Further imagine you get an average age (sample mean) of 46.2 years.

a) Would you get exactly the same average age if you conducted a second poll of 100 different people?

b) If you did 5 such random polls (each with 100 new people), would you expect the average age to be similar from poll to poll?


Sampling multiple times and taking the sample mean each time can be thought of as follows. Notice that the data points (samples) are different each time, and so is the sample mean.


Now imagine that you plot the sample means on a third plot:


And finally, imagine sampling many times, as shown below:

Image:Central_Limit_Theorem_7.png

and computing the sample mean for each sample set. Your plot of sample means might look like the one below.Image:Central_Limit_Theorem_8.png

This is where the Central Limit Theorem Comes in...

Topic Notes: The Central Limit Theorem

The Central Limit Theorem states that:

When obtaining large samples (usually n>30) from any population, the distribution of the correspondig sample means will be approximately normal.

In other words, the sample means themselves make up a random variable, Image:Xbar_large.png, that has a normal distribution just like the ones we looked at in topics like Continuous and Normal Random Variables.

The concept is shown below:
Image:Central_Limit_Theorem_10.png

where the Image:Mu_xbar.png (the mean of the means) can be thought of as follows:

Image:Central_Limit_Theorem_9.png

Equations for the Mean and Standard Deviation (of the sample means)

We can take things one step further and relate the mean and standard deviation of the sample means back to the mean and standard deviation of the original population.

It turns out that the mean of our new random variable, i.e. the mean of the sample means, is equal to the mean of the original population:

Image:Mean_of_means_equation.png

and the standard deviations of the means is given by the following equation:

Image:Stdev_of_the_means.png

where Image:Sigma.png is the standard deviation of the original population.

Summary

The whole situation is summarized by the figure below. First you have the original data and its distribution, shown here to be normal with mean Image:Mu_for_poisson.png and standard deviation Image:Sigma.png.

Then you have a sample of the population. It has a mean Image:Xbar.png and standard deviation s, which are slighty different from Image:Mu_for_poisson.png and Image:Sigma.png because they are derived from a subset of the data.

Then you have the plot of the means (Image:Xbar.png's) that would be obtained by computing the mean from each of many samples. According to the Central Limit Theorem, it is normally distributed. As we saw above, its mean and standard deviation, Image:Mu_xbar.png and Image:Sigma_xbar.png, can be related to those of the original population, Image:Xbar.png and Image:Sigma.png.

Image:Central_Limit_Theorem_11.png

Be sure you understand the difference between these, and how they relate to each other.

Z-score Revisited

Throughout earlier topics, we used the following equation for the z-score of sample data:

Image:Z_score_equation.png

You should also be familiar with the following equation for z-score of a population:

Image:Z_score_for_population.png

And given the above discussion, it should make sense to you that there is an equivalent equation for z-score of the distribution of the means:

Image:Z_score_for_means.png

We will use these in the practice problems below.

Practice Problems

Practice Problem 1

You work at a bakery and your boss has determined that the cooking time for a new kind of muffin is normally distributed with a mean of 8 minutes and a standard deviation of 2 minutes.

a) What is the probability that a muffin will take longer than 9.5 minutes to cook?


b) For a sample of 4 muffins, what is the probability that the average cooking time will exceed 9.5 minutes?


Practice Problem 2

You just started a new company last month. The average sales of your product has already reached $285, with a standard deviation of $55.

What is the probability that for a sample of 35 sales, the average amount sold will exceed $310?


Practice Problem 3

For the same company (average sales of $285 and standard deviation of $55), what is the probability that for a sample of 35 sales, the average amount sold will be between $260 and $310?

Personal tools