Topic:Variables in Data Sets

From SharedExperienceProject

Jump to: navigation, search

Contents

Topic Highlights

(What you will learn)

  • A first look at sampling and data collection
  • A first look at the basics of classifying variables

Introduction and Motivation

(Why learn it)

Especially because of the "Who's in the Room?" activity, this can be used as the first topic in a course on business statistics. As for any first class, it is useful because it allows us to get to know each other while we review the course outline and set expectations.

More essential than than those housekeeping issues though, this first topic is important because:

  1. We will get a taste for the topics of data sets and variables of interest; and
  2. We will get a feel for what it's going to be like to learn together this term.

Learning Activities

(What I need to do to meet the objectives given below)

Learning activities for this topic
Type Name Direction
Reading
Self-directed
In-class worksheet Self-directed
In-class activity
Peer-directed
In-class discussion
Instructor-directed
Take-home quiz
Self-directed


Learning Objectives

(Levels of understanding to be gained)

Learning objectives for this topic
Level of Understanding Objective(s) (presented as self-assessment questions)
Very best
  • Can I provide a reason why a businessperson would want to classify variables anyway?
  • Can I answer all the questions in the take home quiz?
Highly satisfactory
  • Do I know which types of data are typically more useful for statistical analysis? Why is this?
  • Can I answer Example 3d?
  • Can I answer Example 4a?
Satisfactory
  • Do I know the difference between the terms data and variable of interest?
  • Can I provide examples of each?
  • If someone gives me samples of data types, as in the in-class activity or in the lecture, can I classify them?
  • Can I answer Example 2?
  • Can I answer Example 3 a,b and c?
  • Can I answer Example 4 b and c?
Maybe just enough to pass
  • Can I name the different classes (and sub-classes) of variables?
  • Can I answer Example 1?


Topic Notes: Variables in Data Sets

These topic notes are intended to facilitate a discussion on the topic of variables in data sets.Introduction to Business Statistics

What is a variable?

A variable is a quantity of interest, estimated from data we can collect.

  • For example:
    • A gambler might be interested in gaining insight into what the average of two dice rolls might be (the variable of interest)
    • He could gain this insight by measuring the results of two dice rolls (two data sets) and averaging them, many times:
      • average = ( roll 1 + roll 2 ) / 2
  • In business:
    • A manager might be interested in predicting the profitability next month of a given business unit (the variable of interest)
    • She could do this using as input historical revenues and costs (two data sets) and subtracting them from each other:
      • profit = revenue - cost

It should make sense to you that a variable is estimated, understood or predicted more accurately when you have more data. For example:

  • More rolls of the die should improve the gambler's insight into the expected average of two dice rolls
  • Data from five income statements should result in a better prediction of profitability than data from only one income statement

Why would I want to classify a variable?

When a manager plans to collect data, it is helpful if he or she can identify the categories into which the variables of interest fall. This is because:

  • The type of variable you are working with affects the type of analysis you can do with it
  • Collecting the wrong data, or observing a so-called weak variable can waste time and money

How do I classify a variable?

Statistical variables are commonly classified as shown in the figure below.


Using this classification, we would normally refer to any given variable as (only) one of the following:

  1. Categorical-nominal
  2. Categorical-ordinal
  3. Discrete-interval
  4. Discrete-ratio
  5. Continuous-interval
  6. Continuous-ratio

Together, the classifications nominal, ordinal, interval and ratio imply what is known as a measurement scale or level of measurement. Nominal is usually seen as the weakest measure and ratio as the strongest. In other words, a manager can perform a more meaningful statistical analysis with a ratio variable than with a nominal variable.

Categorical vs. numerical variables

Numerical variables naturally take on numerical values, i.e. they are quantitative.

Categorical variables do not, i.e. they are qualitative.

Example 1

a) What kind of variable (categorical or numerical) is your cat's gender (male/female)?

b) Its age?

Types of categorical variables

As shown above, categorical variables can be either:

  • Nominal, meaning that:
    • They are merely labels (or assigned values)
    • They have no order
  • Ordinal, meaning that:
    • They are labels that can be arranged in a meaningful order
    • They can be ranked, but the difference between them means nothing

Example 2

a) What kind of variable is gender (i.e. male, female)?

b) What about the order in which runners finish a race (e.g. first, second, third, fourth)?

c) Does anything change if one assigns arbitrary numerical values to either of the above (e.g. male=0, female=1)?

Types of numerical variables

As shown above, numerical variables can be either:

  • Discrete, meaning that:
    • They typically result from counting something
    • Gaps exist between values of discrete variables
  • Continuous, meaning that:
    • They typically result from measuring something, e.g. length or weight
    • They can take on any value in the range

Example 3

a) What kind of numerical variable (discrete or continuous) can result from a single dice throw, i.e. 1, 2, 3, 4, 5, 6?

b) From measuring your friend's height?

c) From taking the sum of two die, i.e. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12?

d) The average of two die, i.e. 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6?

Further classifying discrete and continuous numerical variables

Discrete and continuous variables can each be either one of the following:

  • Interval, meaning that:
    • Unlike an ordinal variable, it does naturally take on numerical values
    • The difference between two such values has meaning
  • Ratio, meaning that:
    • It also naturally takes on numerical values
    • It has a meaningful zero point
    • Scaling the values has meaning, e.g. twice the value means twice the variable of interest

Example 4

a) What type of numerical variable (discrete or continuous - interval or ratio) is temperature, e.g. 25.1 degrees Celsius?

b) The height of your friend, e.g. 180.3 cm?

c) The number of apples you bought, e.g. 12?

Personal tools