Topic:A First Look at the Central Limit Theorem
From SharedExperienceProject
Contents |
Topic Highlights
(What you will learn)
- A review of random sampling
- A review of finding the sample mean
- What is meant by the term Central Limit Theorem
- The distribution of the a random variable of sample means
- Why the Central Limit Theorem is important
Introduction and Motivation
(Why learn it)
Way back in the topic Introduction to Business Statistics, we looked at the decision-making process and the difference between descriptive and inferential statistics. In doing so, we also looked at the difference between populations, samples, parameters and statistics. In this section, we will dig deeper into these topics by considering the sample mean in detail, along with its distribution. This will give rise to something called the Central Limit Theorem.
Although the relevance of all this may not seem obvious at first blush, we will see in subsequent chapters that it plays very important roles in the decision-making process. So, bear down and get through these basic topics.
Learning Activities
(How the levels of understanding will be gained)
| Type | Name | Direction |
| Reading |
| Self-directed |
| Lecture and discussion |
| Instructor-directed |
| In-class worksheet | Self-directed | |
| Personal activities |
| Self-directed |
Learning Objectives
(Levels of understanding to be gained)
| Level of Understanding | Objective(s) |
| Very best |
|
| Highly satisfactory |
|
| Satisfactory |
|
| Maybe just enough to pass |
|
Topic Notes: Random Sampling and the Sample Mean (Again)
These notes are intended to facilitate an introductory discussion of the sampling distribution of the sample mean.
For understanding the Central Limit Theorem, it is useful to review the basic concepts of random sampling and the sample mean.
Review of Random Sampling and Sample Statistics
When we first did our Introduction to Business Statistics, we looked at the difference between a population and a sample. It can be boiled down to the following figure, which shows the target population - what we would like to know about - and the sample - what we actually know about if we make some measurements.
We also looked at the difference between parameters (of the population) and statistics (of the sample):
Finally, recall that we spent several topics reviewing how to actually compute the various sample statistics: Measures of Central Tendancy and Variation and Measures of Position and Shape. Especially with the midterm coming up, it would be wise to review these, including the calculator methods.
For the remainder of this topic, we will concentrate on the sample mean, which is computed using the following equation:
The following plot shows the concept of sample and the sample mean:
Example 1
Compute the mean of the following data:
| Mean |
|---|
|
So, mean = ( 1.3 + 2.1 + 2.6 + 2.8 + 3.4 + 3.5 + 3.5 + 3.7 + 4.1 ) / 9 = 3.0 |
Sampling More than Once
Have you ever thought about what would happen if you sampled a population more than once?
For example, sampling a population twice can be depicted as shown below:
Example 2
Imagine you conduct a random poll 100 to ask the ages of people living in your quadrant of the city. Further imagine you get an average age (sample mean) of 46.2 years.
a) Would you get exactly the same average age if you conducted a second poll of 100 different people?
b) If you did 5 such random polls (each with 100 new people), would you expect the average age to be similar from poll to poll?
| Solution |
|---|
|
a) No. It would be extremely unlikely that you'd get an average of 46.2 years again. b) Yes. As long as the polls were conducted randomly, they would probably be pretty similar. For example, they might be 46.2, 45.1, 48.2, 41.3 and 44.2, all of which are fairly close to each other considering the likely range of ages. |
Sampling multiple times and taking the sample mean each time can be thought of as follows. Notice that the data points (samples) are different each time, and so is the sample mean.
| Two samples and two sample means |
|---|
Now imagine that you plot the sample means on a third plot:
| Plotting the sample means |
|---|
And finally, imagine sampling many times, as shown below:
and computing the sample mean for each sample set. Your plot of sample means might look like the one below.
This is where the Central Limit Theorem Comes in...
Topic Notes: The Central Limit Theorem
The Central Limit Theorem states that:
When obtaining large samples (usually n>30) from any population, the distribution of the correspondig sample means will be approximately normal.
In other words, the sample means themselves make up a random variable,
, that has a normal distribution just like the ones we looked at in topics like Continuous and Normal Random Variables.
where the
(the mean of the means) can be thought of as follows:
Equations for the Mean and Standard Deviation (of the sample means)
We can take things one step further and relate the mean and standard deviation of the sample means back to the mean and standard deviation of the original population.
It turns out that the mean of our new random variable, i.e. the mean of the sample means, is equal to the mean of the original population:
and the standard deviations of the means is given by the following equation:
where
is the standard deviation of the original population.
Summary
The whole situation is summarized by the figure below. First you have the original data and its distribution, shown here to be normal with mean
and standard deviation
.
Then you have a sample of the population. It has a mean
and standard deviation s, which are slighty different from
and
because they are derived from a subset of the data.
Then you have the plot of the means (
's) that would be obtained by computing the mean from each of many samples. According to the Central Limit Theorem, it is normally distributed. As we saw above, its mean and standard deviation,
and
, can be related to those of the original population,
and
.
Be sure you understand the difference between these, and how they relate to each other.
Z-score Revisited
Throughout earlier topics, we used the following equation for the z-score of sample data:
You should also be familiar with the following equation for z-score of a population:
And given the above discussion, it should make sense to you that there is an equivalent equation for z-score of the distribution of the means:
We will use these in the practice problems below.
Practice Problems
Practice Problem 1
You work at a bakery and your boss has determined that the cooking time for a new kind of muffin is normally distributed with a mean of 8 minutes and a standard deviation of 2 minutes.
a) What is the probability that a muffin will take longer than 9.5 minutes to cook?
b) For a sample of 4 muffins, what is the probability that the average cooking time will exceed 9.5 minutes?
Practice Problem 2
You just started a new company last month. The average sales of your product has already reached $285, with a standard deviation of $55.
What is the probability that for a sample of 35 sales, the average amount sold will exceed $310?
| Solution |
|---|
|
Here, The distribution of the means applies again, as discussed above: And you can find
And now we can solve it using the appropriate form of the z-score equation:
So, P( |
Practice Problem 3
For the same company (average sales of $285 and standard deviation of $55), what is the probability that for a sample of 35 sales, the average amount sold will be between $260 and $310?
| Solution |
|---|
|
Again the population parameters are: The distribution of the means applies again and this time we are looking for the area shown below: We found This time there are two z-scores: For the first: For the second:
|
















