# Chapter 1 Stat Test

statistics

the science and art of collecting, analyzing, and drawing conclusions from data

individual

an object described in a set of data; could be people, animals, or things

variable

an attribute that can take different values for different individuals

categorical variable

a variable that assigns labels that place each individual into a particular group (ex. gender, race, occupation)

quantitative variable

a variable that takes number values that are quantities (counts or measurements) (ex. age, speed)

discrete variable

a variable that takes a fixed set of possible values with gaps between them

continuous variable

a variable that can take any value in an interval on the number line

distribution

tells us what values the variable takes and how often it takes those values

bar graphs, pie charts, side-by-side bar graphs, segmented bar graphs, and mosaic plots

what 5 graphs could be used to summarize/display categorical data

bar graph

shows each category as a bar; the heights of the bars show the category frequencies or relative frequencies

pie chart

shows each category as a slice of a circle; the areas of the slices are proportional to the category frequencies or relative frequencies

- do not replace bars in bar graphs with pictures or use special 3D effects; using pictures make the areas significantly different than the correct heights of the bars, which can be misleading
- make sure the vertical scale starts at 0 so the graph isn't

what 2 things should you be wary of when making/interpreting graphs?

two-way table

a table of counts that summarizes data on the relationship between two categorical variables for some group of individuals

marginal relative frequency

gives the percent or proportion of individuals that have a specific value for one categorical variable

joint relative frequency

gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable (what % of individuals are ____ AND _____)

conditional relative frequency

gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorial variable

side-by-side bar graph

displays the distribution of a categorical variable for each value of another categorical variable; bars are grouped together based on the values of one of the categorical variables and placed side by side

segmented bar graph

displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category

mosaic plot

a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

association

there is a(n) _______________ between two variables if knowing the value of one variable helps us predict the value of the other

causation

association does not imply _______________

they will be roughly the same size for each variable

if there is no association between two variables, how will the sizes of each segment in a segmented bar graph compare to one another?

dotplot

shows each data value as a dot above its location on a number line

look for major peaks, clusters of values, and obvious gaps; classify the distribution as roughly symmetric or skewed

what are important things to mention when describing the shape of a distribution?

roughly symmetric

a distribution is _____ if the right side of the graph is approximately a mirror image of the left side

skewed to the right

a distribution is ______ if the right side of the graph is much longer than the left side (long right tail)

skewed to the left

a distribution is ______ if the left side of the graph is much longer than the right side (long left tail)

approximately uniform

a distribution can be described as _______ if the frequencies of each value are all approximately the same

unimodal

a graph with a single peak

outlier

an observation that falls outside the obvious pattern

what four things do you have to mention when describing a distribution

include context (use the variable name AND the units for the variable when referencing values)

what is very important to do when describing distributions?

stemplot

shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit; stems are ordered from lowest to highest and arranged in a vertical column; a key must be included

each individual observation can be seen when using a stemplot; this is not true of histograms

what is an advantage to using a stemplot rather than a histogram to display a distribution of data?

it can help make the shape of a distribution more clear

what is the value in splitting stems when the data set is large?

histogram

shows each interval of values as a bar; the heights of the bars show the frequencies or relative frequencies of values in each interval

#NAME?

what 2 things are important to remember when making a histogram?

when comparing distributions with different numbers of observations

when should you use percents/proportions instead of counts on the y-axis?

they can make it easier to see the overall pattern in a graph of a larger data set

what is an advantage of using a histogram as opposed to a dotplot or stemplot?

mean

the average of all the individual data values

statistic

a number that describes some characteristic of a sample

parameter

a number that describes some characteristic of a population

resistant

a statistical measure is _______ if it isn't sensitive to extreme values

median

the midpoint of a distribution, the number such that about half of the observations are smaller and about half are larger

mean > median

if a distribution is skewed to the right, how will the mean and median compare?

mean < median

if a distribution is skewed to the left, how will the mean and median compare?

range

the distance between the minimum value and maximum value; most simple measure of variability

median or mean; median is resistant

what are the two ways can we measure center, and which one is resistant to extreme values?

standard deviation

measures the typical distance of the values in a distribution from the mean

variance

the average squared deviation (where deviation = value - mean)

when all the values in a distribution are the same

under what circumstances would the standard deviation be equal to zero?

range and standard deviation are nonresistant; IQR is resistant

what are the ways to measure variability, and which are resistant/nonresistant?

greater the variation from the mean

what do larger values of standard deviation indicate?

quartiles

values that divide the ordered data set into 4 groups having roughly the same number of values

first quartile (Q1)

the median of the data values that are to the left of the median in the ordered list (25th percentile)

third quartile (Q3)

the median of the data values that are to the right of the median in the ordered list (75th percentile)

interquartile range (IQR)

the distance between the first and third quartiles of a distribution; measures the variability in the middle half of a distribution

leave out the median

when calculating quartile values, what should you make sure you do?

low cutoff = Q1 - (1.5 x IQR) ; high cutoff = Q3 + (1.5 x IQR)

how do you find cutoffs for outliers using the 1.5 * IQR rule?

five-number summary

consists of the minimum, the first quartile, the median, the third quartile, and the maximum

boxplot (box and whisker plot)

a visual representation of the five-number summary

#NAME?

what are some important things to note about box and whisker plots?

normal distribution

a bell-shaped curve