Topic:Continuous and Normal Random Variables
From SharedExperienceProject
Topic Highlights
(What you will learn)
- The concept of a continuous random variable
- The concept of a probability for the continuous case
- How these differ from their discrete counterparts (which we've been looking at for some time now)
- How probability problems are approached in the continuous case
- The normal distribution and some intuition about it
Introduction and Motivation
(Why learn it)
Remember the topic Variables in Data Sets in which we learned about different types of variables? There, we compared the concept of a discrete variable (such as the number of people in a room) to a continuous variable (such as the height of people in the room).
Well, for many topics now we've been looking at the concepts of discrete random variables and the corresponding discrete probability distributions, and in this topic we're going to shift our attention to their continuous counterparts.
Continuous random variables and continuous probability distributions are incredibly common in business statistics. In particular, normal random variables play important roles in many real-world decision-making processes. You need to be familiar with these.
Luckily, the material is not too difficult to grasp if you understand the basic concepts. Providing you with that understanding is the goal of this topic.
Learning Activities
(How the levels of understanding will be gained)
| Type | Name | Direction |
| Reading |
| Self-directed |
| In-class discussion |
| Self-directed |
| Practice Problems |
| Peer-directed |
| Personal activities |
| Self-directed |
Learning Objectives
(Levels of understanding to be gained)
| Level of Understanding | Objective(s) |
| Very best |
|
| Highly satisfactory |
|
| Satisfactory | |
| Maybe just enough to pass |
Lecture Notes: Continuous Random Variables and Probability Distributions
These notes are written to facilitate an introductory discussion on continuous random variables and probability distributions.
Let's start with discrete
By now you should be pretty comfortable with the world of discrete random variables and discrete probability distributions. We saw them in the following topics:
Let's start our discussion about the continuous case by taking a quick look back at a discrete case we are familiar with.
Example 1 - A familiar discrete example
Do you recall the experiment in which you flip a coin 3 times?
a) Create the table of outcomes for the discrete random variable X = number of heads
b) Then create the probability distribution table (include the frequency column as we did when we first studied this)
c) Which statistics table would you use to obtain the same result? Can you do it?
| Solution to a) |
|---|
| |
| Solution to b) |
|---|
|
| Solution to c) |
|---|
|
Example 2 - Plotting the distribution
Do you notice anything about the third column of the probability distribution table we obtained above?
The third column is just a relative frequency distribution! See Frequency Distributions and Histograms if you need a refresher.
This allows us to create the following plot of x against relative frequency:
Example 3 - Another one
Just to be sure you've got it, have a look at the same thing for the case of flipping the coin 6 times.
You can look up the probabilities in the binomial probability table (Table A.1 in Kvanli et al. and Bowerman et al.):
And the following plot can be created:
So, what's the point of reviewing all this discrete stuff?
Two points:
- It's a good review
- If you can see the link between the discrete probability tables and plots above, then you're a long way to understanding the concept of probabilities with continuous variables. Let's dig deeper...
Example 4
What if you are asked to provide the probability of rolling 3 heads, P(x=3), for the above example?
| Solution |
|---|
P(x=3) = 0.313
|
Example 5
What if you are asked to provide the probability of rolling between 2 and 5 heads (inclusive), P(2<=x<=5)?
| Solution |
|---|
P(x>1) = P(2) + P(3) + P(4) + P(5) = 0.234 + 0.313 + 0.234 + 0.094 = 0.875
|
Now the concept of continuous probabilities is just an extension of this...
Let's get continuous
A discrete probability distribution only has probabilities at certain discrete values of x. Above, they are 0, 1, 2, 3, 4, 5 and 6.
A continuous probability distribution has probabilities at all values in between, as shown below:
Example 6
How do you think the following probability is represented on the above plot: P(2<x<5)?
| Solution |
|---|
Example 7
What about P(x<2.5)?
| Solution |
|---|
Example 8
What about P(x>1.5)?
| Solution |
|---|
Summary of the continuous case
To summarize:
- When you have a continuous random variable, you have a continuous distribution
- This means that there is a relative frequency (the y-axis) for every value that the random variable can take on (the x-axis)
- So the plot has a curve instead of points
- We speak about probabilities for ranges of x
- (not at specific values of x, as in the discrete case)
- The probability for a continuous random variable is the area under the curve
- (not the value or sum of values, as in the discrete case)
Lecture Notes: Introduction to the Normal Distribution
These notes are intended to facilitate a first discussion of the basic concepts of the normal curve of Section 6.2 of Kvanli et al., including a review of the basic interpretation of mean and standard deviation given way back in Section 3.6.
The Normal Distribution
Because it plays a role in so many business (and other applications), the normal distribution is the most important of all the continuous distributions. In fact, a great many business managers assume that the populations of interest in their problems can be represented using a normal distribution.
A plot of the normal distribution is referred to as a normal curve. It is given below.
A normal curve can be described completely if you know only the mean,
, and standard deviation, s.
Example 9
Can you identify the mean and draw the range implied by the standard deviation on this normal curve?
| Solution |
|---|
The Empirical Rule
You now know enough that so-called Empirical Rule can be introduced.
The Empirical Rule is useful because it allows you to estimate an approximate range within which proportions of the data fall for a given normal distribution.
The rule says that for a normally distributed population:
- approximately 68.3 % of the population measurements fall within (plus or minus) one standard deviation of the mean
- approximately 95.4 % of the population measurements fall within (plus or minus) two standard deviations of the mean
- approximately 99.7 % of the population measurements fall within (plus or minus) three standard deviations of the mean
Example 10
How much of the data are predicted to fall in the range +/- s?
| Solution |
|---|
|
It's the so-called Empirical Rule you're looking for here. As a rule of thumb, approximately 68% fall in that range: |
Example 11
How about the ranges defined by +/- 2s and +/- 3s?
| Solution |
|---|
|
It's the Empirical Rule again. As a rule of thumb, approximately 95% and 99.7% fall in those ranges: |
Chebychev's Theorem
The Empirical Rule (above) holds for cases in which you know the data are distributed normally.
Unfortunately in real life you don't always know whether your data fit a normal distribution.
That's where Chebychev's theorem comes in. The theorem allows you to find an interval that contains a specified percentage of the individual measurements in the population.
If you know the mean and standard deviation of a population, then:
Example 12
a) If you know your data are normally distributed, then what percentage of the data lie between
- 2s and
+ 2s?
b) If you don't know the underlying distribution, then at least what percentage of the data lie between
- 2s and
+ 2s?
Practice Problems
Practice Problem 1
Draw a normal curve with the following values for
and
:
Practice Problem 2
In what range would approximately 99.7% of the data fall for normally distributed data with a mean of 10 and a standard deviation of 2?
| Solution |
|---|
|
Using the appropriate portion of the Empirical Rule the requested range is given by: In other words, approximately 99.7% of the data fall between 4 and 16 for the given normal distribution. |
Practice Problem 3
For a week in October the mean temperature is 10 degrees and the standard deviation is 2 degrees.
a) If you're told the temperature data are normally distributed, what percentage of them fall between 8 and 12 degrees?
b) If you're not sure whether the data are normally distributed, at least what percentage of the data fall between 7 and 13 degrees?
| Solution |
|---|
|
Here you need Chebychev's theorem because you don't know the underlying distribution. You aren't given k directly so need to compute it. Recall that k is the number of standard deviations away from the mean, and so can be computed as follows: Then you can use the equation: In other words, we can say that at least 55.55% of the data fall between 7 and 13 degrees. |
References
- ↑ This is a very introductory topic, so there are no learning activities at this level.













