Topic:Variables in Data Sets
From SharedExperienceProject
Contents
|
Topic Highlights
(What you will learn)
- A first look at sampling and data collection
- A first look at the basics of classifying variables
Introduction and Motivation
(Why learn it)
Especially because of the "Who's in the Room?" activity, this can be used as the first topic in a course on business statistics. As for any first class, it is useful because it allows us to get to know each other while we review the course outline and set expectations.
More essential than than those housekeeping issues though, this first topic is important because:
- We will get a taste for the topics of data sets and variables of interest; and
- We will get a feel for what it's going to be like to learn together this term.
Learning Activities
(What I need to do to meet the objectives given below)
| Type | Name | Direction |
| Reading |
| Self-directed |
| In-class worksheet | Self-directed | |
| In-class activity | Peer-directed | |
| In-class discussion |
| Instructor-directed |
| Take-home quiz |
| Self-directed |
Learning Objectives
(Levels of understanding to be gained)
| Level of Understanding | Objective(s) (presented as self-assessment questions) |
| Very best |
|
| Highly satisfactory |
|
| Satisfactory |
|
| Maybe just enough to pass |
|
Topic Notes: Variables in Data Sets
These topic notes are intended to facilitate a discussion on the topic of variables in data sets.Introduction to Business Statistics
What is a variable?
A variable is a quantity of interest, estimated from data we can collect.
- For example:
- A gambler might be interested in gaining insight into what the average of two dice rolls might be (the variable of interest)
- He could gain this insight by measuring the results of two dice rolls (two data sets) and averaging them, many times:
- average = ( roll 1 + roll 2 ) / 2
- In business:
- A manager might be interested in predicting the profitability next month of a given business unit (the variable of interest)
- She could do this using as input historical revenues and costs (two data sets) and subtracting them from each other:
- profit = revenue - cost
It should make sense to you that a variable is estimated, understood or predicted more accurately when you have more data. For example:
- More rolls of the die should improve the gambler's insight into the expected average of two dice rolls
- Data from five income statements should result in a better prediction of profitability than data from only one income statement
Why would I want to classify a variable?
When a manager plans to collect data, it is helpful if he or she can identify the categories into which the variables of interest fall. This is because:
- The type of variable you are working with affects the type of analysis you can do with it
- Collecting the wrong data, or observing a so-called weak variable can waste time and money
How do I classify a variable?
Statistical variables are commonly classified as shown in the figure below.
| Classification of variables |
|---|
| |
Using this classification, we would normally refer to any given variable as (only) one of the following:
- Categorical-nominal
- Categorical-ordinal
- Discrete-interval
- Discrete-ratio
- Continuous-interval
- Continuous-ratio
Together, the classifications nominal, ordinal, interval and ratio imply what is known as a measurement scale or level of measurement. Nominal is usually seen as the weakest measure and ratio as the strongest. In other words, a manager can perform a more meaningful statistical analysis with a ratio variable than with a nominal variable.
Categorical vs. numerical variables
Numerical variables naturally take on numerical values, i.e. they are quantitative.
Categorical variables do not, i.e. they are qualitative.
Example 1
a) What kind of variable (categorical or numerical) is your cat's gender (male/female)?
b) Its age?
Types of categorical variables
As shown above, categorical variables can be either:
- Nominal, meaning that:
- They are merely labels (or assigned values)
- They have no order
- Ordinal, meaning that:
- They are labels that can be arranged in a meaningful order
- They can be ranked, but the difference between them means nothing
Example 2
a) What kind of variable is gender (i.e. male, female)?
b) What about the order in which runners finish a race (e.g. first, second, third, fourth)?
c) Does anything change if one assigns arbitrary numerical values to either of the above (e.g. male=0, female=1)?
Types of numerical variables
As shown above, numerical variables can be either:
- Discrete, meaning that:
- They typically result from counting something
- Gaps exist between values of discrete variables
- Continuous, meaning that:
- They typically result from measuring something, e.g. length or weight
- They can take on any value in the range
- They typically result from measuring something, e.g. length or weight
Example 3
a) What kind of numerical variable (discrete or continuous) can result from a single dice throw, i.e. 1, 2, 3, 4, 5, 6?
b) From measuring your friend's height?
c) From taking the sum of two die, i.e. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12?
d) The average of two die, i.e. 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6?
Further classifying discrete and continuous numerical variables
Discrete and continuous variables can each be either one of the following:
- Interval, meaning that:
- Unlike an ordinal variable, it does naturally take on numerical values
- The difference between two such values has meaning
- Ratio, meaning that:
- It also naturally takes on numerical values
- It has a meaningful zero point
- Scaling the values has meaning, e.g. twice the value means twice the variable of interest
Example 4
a) What type of numerical variable (discrete or continuous - interval or ratio) is temperature, e.g. 25.1 degrees Celsius?
b) The height of your friend, e.g. 180.3 cm?
c) The number of apples you bought, e.g. 12?

