AP Stats Vocab (2,3,4)

Context

The context ideally tells Who was measured, What was measured, How the data were collected, Where the data were collected, and When and Why the study was performed

Data

Systematically recorded information, whether numbers or labels, together with its context

Data Table

An arrangement of data in which each row represents a case and each column represents a variable

Case

A ____ is an individual about whom or which we have data

Population

All the cases we wish we knew about

Sample

The cases we actually examine in seeking to understand the much larger population

Variable

A ______ holds information about the same characteristic for many cases

Units

A quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams

Categorical Variable

A variable that names categories (whether with words or numerals) is called categorical

Quantitative Variable

A variable in which the numbers act as numerical values is called _____. ______ variables always have units.

Frequency Table (Relative Frequency Table)

A ______ ______ lists the categories in a categorical variable and gives the count (or percentage of observations for each category

Distribution

The ______ of a variable gives the possible values of the variable and the relative frequency of each value

Area Principle

In a statistical display, each data value should be represented by the same amount of area.

Bar chart (Relative Frequency Bar Chart)

______ _______ show a bar whose area represents the count (or percentage) of observations for each category of a categorical variable

Pie Chart

____ ______ show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category

Categorical data condition

The methods in this chapter are appropriate for displaying and describing categorical data. Be careful not to use them with quantitative data.

Contigency Table

A ______ _____ displays counts and, sometimes, percentages of individuals falling into named categories on two or more variables. The table categorizes the individuals on all variables at once to reveal possible patterns in one variable that may be contig

Marginal Distribution

In a contigency tables, the distribution of either variable alone is called the _____ ______. The counts or percentages are the totals found in the margins (last row ot column) of the table.

Conditional Distribution

The distribution of a variable restricting the Who to consider only a smaller group of individuals is called a ______ _______.

Independence

Variables are said to be ____ if the conditional distribution of one variable is the same for each category of the other.

Segmented Bar Chart

A _____ ___ _____ displays the conditional distribution of a categorical variable within each category of another variable.

Simpson's Paradox

When averages are taken across different groups, they can appear to contradict the overall averages.

Distribution

The ____ of a quantitative variable slices up all the possible values of the variable into equal-width bins and gives the number of values (or counts) of values falling in each bin

Histogram (relative frequency histogram)

A ____ uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency (or relative frequency) of values falling into each bin

Gap

A region of the distribution where there are no values

Stem-and-leaf display

A ________ shows quantitative data values in a way that sketches the distribution of the data. It's best described in detail by example.

Dotplot

A ____ graphs a dot for each case against a single axis

Shape

To describle the _____ of a distribution, looks for single vs. multiple modes, symmetry vs. skewness, and outliers and gaps

Center

The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of the ____ include the mean and median

Spread

A numerical summary of how tightly the values are clustered around the center. Measures of ____ inclue IQR and standard deviation

Unimodal

Having one mode. This is a useful term for describing the shape of a histogram when it's generally mound-shapes.

Bimodal

Distributions with two modes

Multimodal

Distributions with more than two modes

Uniform

A distribution that's roughly flat is said to be _______.

Symmetric

A distribution is _____ if the two halves on either side of the center look approximately like mirror images of each other.

Tails

The _____ of a distribution are the parts that technically trail off on either side. Distributions can be characterized as having long ____ (if they straggle for some distance) or short _____ (if they don't)

Skewed

A distribution is _____ if it's not symmetric and one tail stretches out farther than the other.

Outliers

_____ are extreme values that don't appear to belong with the rest of the data. They may be unusual values that deserve further investigation, or they may be just mistakes; there's no obvious way to tell. Don't delete _____ automatically--you have to thin

Median

The ______ is the middle value, with half of the data above and half below it. If n is even, it is the average of the two middle values. It is usually paired with the IQR

Range

The difference between the lowest and highest value in a data set. (Max-min)

Quartile

The lower ____ is the value with a quarter of the data below it. The upper ____ has three quarters of the data below it. The median and ____ divide data into four parts with equal numbers of data values

Interquartile Range (IQR)

The ___ is the difference between the first and third quartiles. It is usually reported with the median

Percentile

the ith ____ is the number that falls above i% of the data

5-Number Summary

the _____ ______ of a distribution reports the minimum value, Q1, the median, Q3, and the maximum value

Mean

The ____ is found by summing all the data values and dividing by the count. It is usually paired with standard deviation.

Resistant

A calculated summary is said to be ______ if outliers have only a small effect on it.

Variance

The ____ is the sum of squared deviations from the mean, divided by the count minus 1

Standard deviation

The ____ ____ is the square root of the variance. It is usually reported along with the mean.

Boxplot

A ____ displays the 5-number summary as a central box with wiskers that extend to the nonoutlying data values. ____ are particularly effective for comparing groups and for displaying outliers.

Outlier

Any point more than 1.5 IQR from either end of the box in a boxplot is nominated as an ______.

Far Outlier

If a point is more than 3.0 IQR from either end of the box in a boxplot, it is nominated as a ___ _____.

Comparing Distributions

When ____ ______ of several groups using histograms or stem-and-leaf displays, consider their shape, center, and spread.

Comparing Boxplots

When _____ ______ with boxplots, compare their shapes, medians, IQR's, and check for possible outliers.

Timeplot

A ____ displays data that change over tome. Often, successive values are connected with lines to show trends more clearly. Sometimes a smooth curve is added to the plot to help show longterm patterns and trends