Vocabulary Chapters 1-4

Who

The people, places, or things that the information was gathered on.

What

The information/characteristics gathered.

When

The time period in which the data were collected.

Where

The source of the data.

Why

The goals/purpose of gathering the data.

How

The methods used to collect the data.

A _______ contains columns of information called _______ about each individual row called _______.

Data tables
variables
cases

A _______ refers to the entire group of interest in a study.

Population

A subset of the group of interest is called a _______.

Sample

A _______ is a type of variable with labels, called _______, as its possible values.

Categorial variable
Categories

A _______ is a type of variable with numbers, measured in _______, as its possible values.

Quantitative variable
units

Bar Graph

Graphical display of the distribution of one categorical variable. NOT all categories must be displayed.

Pie Chart

Graphical display of the distribution of one categorical variable. ALL categories must be displayed.

Frequency/Relative Frequency Table

Summary of the distribution of one categorical variable containing the number and proportion of observations belonging to each category.

The distribution of a categorical variable is described by:

The categories of the variable.
How frequently each category occurs as measured by either the count or proportion.

The number of observations belonging to a particular category is called a _______. If you divide this count by the total number of observations, this is called a _______.

Count
proportion

The primary difference between a frequency table and a relative frequency table is:

The frequency table contains counts and the relative frequency table contains proportions.

When examining the relationship between two categorical variables, the _______ is used to explain the differences in the distribution of the _______.

Explanatory variable
response variable

In looking at the relationship between two categorical variables, we are interested in determining whether the distribution of the _______ is different for different categories of the _______.

Response variable
explanatory variable

_______ summarize the distribution of each variable separately, ignoring the categories of the other variable.

Marginal distributions

_______ summarize the distribution of one variable contingent upon a particular category for the other variable.

Conditional distributions

When there are differences between conditional distributions and marginal distributions for the response variable, this indicates an _______ between the two variables.

Association

When the conditional distributions and marginal distribution for the response variable are similar, this indicates an _______ between the two variables.

Independence

The cross-classification of observations according to the categories of two categorical variables is called a ___________.

Contingency table

Unimodal, Bimodal, Multimodal, Uniform, Symmetric, Skewed

Shape

Median, Mean

Center

Range, Interquartile Range (IQR), Standard Deviation

Variable

The distribution of a quantitative variable can be summarized according to its

shape, center and variability

The _______ is a measure of the center of a quantitative variable. It is found by adding up the values of all observations and dividing by the _______. It is the location where the _______ balances.

Sample mean
Same size
Histogram

The _______ measures the "average" distance between the observations and the sample mean. It has the same units as the observations.

Same standard deviation

If an outlier exists, it lies in/near what part of the distribution?

Tails

The primary difference between a stem-and-leaf plot and a histogram is:

The stem-and-leaf plot shows every individual observation and the histogram does not.

Select all of the following below that are included in the five number summary.

Median
Minimum
Maximum
1st Quartile (Q1)
3rd Quartile (Q3)

Select all of the correct interpretations of the 30th percentile.

30% of all observations are less than the 30th percentile.
70% of all observations are greater than the 30th percentile.

A _______ is a type of variable with _______ as its possible values.

Quantitative variable
numbers

A _______ is a graphical summary of the _______ of a quantitative variable.

Histogram
distribution

The peaks or "humps" in a histogram are called _______.

Modes

A histogram with a single model is called _______.

Unimodal

A histogram with two modes is called _______.

Bimodal

A histogram with more than two modes is called _______.

Multimodel

A histogram with no modes is called _______.

Uniform

A histogram with two halves that are roughly the same is called _______.

Symmetric

A histogram with a concentration of either low values or high values is called _______.

Skewed

The edges of a histogram are called the _______.

Tails of distribution

The smallest observation of a quantitative variable is called the _______.

Minimum

The largest observation of a quantitative variable is called the _______.

Maximum

50% of all observations are below the _______.

Median

25% of all observations are below the _______.

1st quartile - Q1

75% of all observations are below the _______.

3rd quartile - Q3

The difference between the maximum and minimum values is called the _______.

Range

The difference between Q3 and Q1 is called the _______.

Interquartile range - IQR

When comparing the distribution of a single quantitative variable between groups, the quantitative variable is the _______ variable and the categorical variable is the _______ variable.

Response
explanatory

A boxplot is a graphical summary of

The five number summary

The five number summary includes all of the following:

Median
minimum
maximum
1st quartile
3rd quartile

In a boxplot, an outlier is noted for any observation

less than Q1 - 1.5
IQR or greater than Q3 + 1.5
IQR.

Select the correct statement below

The 5 number summary can be calculated for a quantitative variable, but not for a categorical variable.