Topic:Continuous and Normal Random Variables

From SharedExperienceProject

Jump to: navigation, search

Contents

Topic Highlights

(What you will learn)

  • The concept of a continuous random variable
  • The concept of a probability for the continuous case
  • How these differ from their discrete counterparts (which we've been looking at for some time now)
  • How probability problems are approached in the continuous case
  • The normal distribution and some intuition about it

Introduction and Motivation

(Why learn it)

Remember the topic Variables in Data Sets in which we learned about different types of variables? There, we compared the concept of a discrete variable (such as the number of people in a room) to a continuous variable (such as the height of people in the room).

Well, for many topics now we've been looking at the concepts of discrete random variables and the corresponding discrete probability distributions, and in this topic we're going to shift our attention to their continuous counterparts.

Continuous random variables and continuous probability distributions are incredibly common in business statistics. In particular, normal random variables play important roles in many real-world decision-making processes. You need to be familiar with these.

Luckily, the material is not too difficult to grasp if you understand the basic concepts. Providing you with that understanding is the goal of this topic.

Learning Activities

(How the levels of understanding will be gained)

Learning activities for this topic
Type Name Direction
Reading Self-directed
In-class discussion
Self-directed
Practice Problems
Peer-directed
Personal activities Self-directed

 

Learning Objectives

(Levels of understanding to be gained)

Learning objectives for this topic
Level of Understanding Objective(s)
Very best
Highly satisfactory
  • None
Satisfactory
Maybe just enough to pass

Lecture Notes: Continuous Random Variables and Probability Distributions

These notes are written to facilitate an introductory discussion on continuous random variables and probability distributions.

Let's start with discrete

By now you should be pretty comfortable with the world of discrete random variables and discrete probability distributions. We saw them in the following topics:

Let's start our discussion about the continuous case by taking a quick look back at a discrete case we are familiar with.

Example 1 - A familiar discrete example

Do you recall the experiment in which you flip a coin 3 times?

a) Create the table of outcomes for the discrete random variable X = number of heads

b) Then create the probability distribution table (include the frequency column as we did when we first studied this)

c) Which statistics table would you use to obtain the same result? Can you do it?



Example 2 - Plotting the distribution

Do you notice anything about the third column of the probability distribution table we obtained above?

Image:Discrete_Probability_Distribution_5.PNG

The third column is just a relative frequency distribution! See Frequency Distributions and Histograms if you need a refresher.

This allows us to create the following plot of x against relative frequency:

Image:Continuous_1.png

Example 3 - Another one

Just to be sure you've got it, have a look at the same thing for the case of flipping the coin 6 times.

You can look up the probabilities in the binomial probability table (Table A.1 in Kvanli et al. and Bowerman et al.):

Image:Continuous_2.PNG

And the following plot can be created:

Image:Continuous_3.png

So, what's the point of reviewing all this discrete stuff?

Two points:

  1. It's a good review
  2. If you can see the link between the discrete probability tables and plots above, then you're a long way to understanding the concept of probabilities with continuous variables. Let's dig deeper...

Example 4

What if you are asked to provide the probability of rolling 3 heads, P(x=3), for the above example?


Example 5

What if you are asked to provide the probability of rolling between 2 and 5 heads (inclusive), P(2<=x<=5)?


Now the concept of continuous probabilities is just an extension of this...

Let's get continuous

A discrete probability distribution only has probabilities at certain discrete values of x. Above, they are 0, 1, 2, 3, 4, 5 and 6.

A continuous probability distribution has probabilities at all values in between, as shown below:

Image:Continuous_6.png

Example 6

How do you think the following probability is represented on the above plot: P(2<x<5)?


Example 7

What about P(x<2.5)?


Example 8

What about P(x>1.5)?


Summary of the continuous case

To summarize:

  • When you have a continuous random variable, you have a continuous distribution
  • This means that there is a relative frequency (the y-axis) for every value that the random variable can take on (the x-axis)
    • So the plot has a curve instead of points
  • We speak about probabilities for ranges of x
    • (not at specific values of x, as in the discrete case)
  • The probability for a continuous random variable is the area under the curve
    • (not the value or sum of values, as in the discrete case)


Lecture Notes: Introduction to the Normal Distribution

These notes are intended to facilitate a first discussion of the basic concepts of the normal curve of Section 6.2 of Kvanli et al., including a review of the basic interpretation of mean and standard deviation given way back in Section 3.6.

The Normal Distribution

Because it plays a role in so many business (and other applications), the normal distribution is the most important of all the continuous distributions. In fact, a great many business managers assume that the populations of interest in their problems can be represented using a normal distribution.

A plot of the normal distribution is referred to as a normal curve. It is given below.

Image:Continuous_10.png

A normal curve can be described completely if you know only the mean, Image:Xbar.png, and standard deviation, s.

Example 9

Can you identify the mean and draw the range implied by the standard deviation on this normal curve?


The Empirical Rule

You now know enough that so-called Empirical Rule can be introduced.

The Empirical Rule is useful because it allows you to estimate an approximate range within which proportions of the data fall for a given normal distribution.

The rule says that for a normally distributed population:

  • approximately 68.3 % of the population measurements fall within (plus or minus) one standard deviation of the mean
    • i.e. approximately 68.3% of the data are in the range [ Image:Xbar.png-s, Image:Xbar.png+s ]
  • approximately 95.4 % of the population measurements fall within (plus or minus) two standard deviations of the mean
    • i.e. approximately 95.4% of the data are in the range [ Image:Xbar.png-2s, Image:Xbar.png+2s ]
  • approximately 99.7 % of the population measurements fall within (plus or minus) three standard deviations of the mean
    • i.e. approximately 99.7% of the data are in the range [ Image:Xbar.png-3s, Image:Xbar.png+3s ]

Example 10

How much of the data are predicted to fall in the range +/- s?


Example 11

How about the ranges defined by +/- 2s and +/- 3s?


Chebychev's Theorem

The Empirical Rule (above) holds for cases in which you know the data are distributed normally.

Unfortunately in real life you don't always know whether your data fit a normal distribution.

That's where Chebychev's theorem comes in. The theorem allows you to find an interval that contains a specified percentage of the individual measurements in the population.

If you know the mean and standard deviation of a population, then:

  • At least (1-1/k2) x 100% of the data lie in the range [ Image:Xbar.png-ks, Image:Xbar.png+ ks]
  • Where k is the number of standad deviations away from the mean

Example 12

a) If you know your data are normally distributed, then what percentage of the data lie between Image:Xbar.png - 2s and Image:Xbar.png + 2s?

b) If you don't know the underlying distribution, then at least what percentage of the data lie between Image:Xbar.png - 2s and Image:Xbar.png + 2s?


Practice Problems

Practice Problem 1

Draw a normal curve with the following values for Image:Mu_for_poisson.png and Image:Sigma.png:

a) Image:Mu_for_poisson.png = 0, Image:Sigma.png = 1

b) Image:Mu_for_poisson.png = 4, Image:Sigma.png = 1

c) Image:Mu_for_poisson.png = 4, Image:Sigma.png = 3

d) Image:Mu_for_poisson.png = -4, Image:Sigma.png = 3

Practice Problem 2

In what range would approximately 99.7% of the data fall for normally distributed data with a mean of 10 and a standard deviation of 2?


Practice Problem 3

For a week in October the mean temperature is 10 degrees and the standard deviation is 2 degrees.

a) If you're told the temperature data are normally distributed, what percentage of them fall between 8 and 12 degrees?


b) If you're not sure whether the data are normally distributed, at least what percentage of the data fall between 7 and 13 degrees?


References

  1. This is a very introductory topic, so there are no learning activities at this level.
Personal tools