Who
The people, places, or things that the information was gathered on.
What
The information/characteristics gathered.
When
The time period in which the data were collected.
Where
The source of the data.
Why
The goals/purpose of gathering the data.
How
The methods used to collect the data.
A _______ contains columns of information called _______ about each individual row called _______.
Data tables
variables
cases
A _______ refers to the entire group of interest in a study.
Population
A subset of the group of interest is called a _______.
Sample
A _______ is a type of variable with labels, called _______, as its possible values.
Categorial variable
Categories
A _______ is a type of variable with numbers, measured in _______, as its possible values.
Quantitative variable
units
Bar Graph
Graphical display of the distribution of one categorical variable. NOT all categories must be displayed.
Pie Chart
Graphical display of the distribution of one categorical variable. ALL categories must be displayed.
Frequency/Relative Frequency Table
Summary of the distribution of one categorical variable containing the number and proportion of observations belonging to each category.
The distribution of a categorical variable is described by:
The categories of the variable.
How frequently each category occurs as measured by either the count or proportion.
The number of observations belonging to a particular category is called a _______. If you divide this count by the total number of observations, this is called a _______.
Count
proportion
The primary difference between a frequency table and a relative frequency table is:
The frequency table contains counts and the relative frequency table contains proportions.
When examining the relationship between two categorical variables, the _______ is used to explain the differences in the distribution of the _______.
Explanatory variable
response variable
In looking at the relationship between two categorical variables, we are interested in determining whether the distribution of the _______ is different for different categories of the _______.
Response variable
explanatory variable
_______ summarize the distribution of each variable separately, ignoring the categories of the other variable.
Marginal distributions
_______ summarize the distribution of one variable contingent upon a particular category for the other variable.
Conditional distributions
When there are differences between conditional distributions and marginal distributions for the response variable, this indicates an _______ between the two variables.
Association
When the conditional distributions and marginal distribution for the response variable are similar, this indicates an _______ between the two variables.
Independence
The cross-classification of observations according to the categories of two categorical variables is called a ___________.
Contingency table
Unimodal, Bimodal, Multimodal, Uniform, Symmetric, Skewed
Shape
Median, Mean
Center
Range, Interquartile Range (IQR), Standard Deviation
Variable
The distribution of a quantitative variable can be summarized according to its
shape, center and variability
The _______ is a measure of the center of a quantitative variable. It is found by adding up the values of all observations and dividing by the _______. It is the location where the _______ balances.
Sample mean
Same size
Histogram
The _______ measures the "average" distance between the observations and the sample mean. It has the same units as the observations.
Same standard deviation
If an outlier exists, it lies in/near what part of the distribution?
Tails
The primary difference between a stem-and-leaf plot and a histogram is:
The stem-and-leaf plot shows every individual observation and the histogram does not.
Select all of the following below that are included in the five number summary.
Median
Minimum
Maximum
1st Quartile (Q1)
3rd Quartile (Q3)
Select all of the correct interpretations of the 30th percentile.
30% of all observations are less than the 30th percentile.
70% of all observations are greater than the 30th percentile.
A _______ is a type of variable with _______ as its possible values.
Quantitative variable
numbers
A _______ is a graphical summary of the _______ of a quantitative variable.
Histogram
distribution
The peaks or "humps" in a histogram are called _______.
Modes
A histogram with a single model is called _______.
Unimodal
A histogram with two modes is called _______.
Bimodal
A histogram with more than two modes is called _______.
Multimodel
A histogram with no modes is called _______.
Uniform
A histogram with two halves that are roughly the same is called _______.
Symmetric
A histogram with a concentration of either low values or high values is called _______.
Skewed
The edges of a histogram are called the _______.
Tails of distribution
The smallest observation of a quantitative variable is called the _______.
Minimum
The largest observation of a quantitative variable is called the _______.
Maximum
50% of all observations are below the _______.
Median
25% of all observations are below the _______.
1st quartile - Q1
75% of all observations are below the _______.
3rd quartile - Q3
The difference between the maximum and minimum values is called the _______.
Range
The difference between Q3 and Q1 is called the _______.
Interquartile range - IQR
When comparing the distribution of a single quantitative variable between groups, the quantitative variable is the _______ variable and the categorical variable is the _______ variable.
Response
explanatory
A boxplot is a graphical summary of
The five number summary
The five number summary includes all of the following:
Median
minimum
maximum
1st quartile
3rd quartile
In a boxplot, an outlier is noted for any observation
less than Q1 - 1.5
IQR or greater than Q3 + 1.5
IQR.
Select the correct statement below
The 5 number summary can be calculated for a quantitative variable, but not for a categorical variable.