statistics
the science of data
data analysis
organizing, displaying, summarizing, and asking questions
individuals
the objects described by a set of data. May be people, animals, or things
variable
any characteristic of an individual, can take different values for different individuals
categorical variable
places an individual into one of several groups or categories
quantitative variable
takes numerical values for which it makes sense to find an average
distribution (of a variable)
tells us what values the variable takes and how often it takes these values
inference
drawing conclusion that go beyond the data
frequency table
displays the counts of category
relative frequency table
shows the percents of a category
marginal distribution
one of the categorical variables in a two-way table of counts is the distribution on values of that variable among all individuals described by the table
marginal distribution
tells us nothing about the relationship between two variables
conditional distribution
describes the values of a specific variable among individuals who have a specific value of another variable.
association
this occurs between two variables if specific values of one variable tend to occur in common with specific values of the other
Simpson's paradox
an association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined.
Four Step Process
state, plan, do, conclude
skewed right
a distribution is .......... if there is more data on the left side of a graph
skewed left
a distribution is ......... if there is more data on the right side of the graph
unimodal
haivng a single peak
bimodal
two clear peaks
multimodal
more than two clear peaks
stemplot
gives us a quick picture of the shape of a distribution while including the actual numerical values in a graph
histogram
most common graph of the distribution for one quantitative variable
first quartile
the median of the values to the left of the median
third quartile
the median of the values to the right of the median
median, mean
the .................... is more resistant to an outlier than .......
interquartile range (IQR)
measures the range of the middle 50% of the data
five number summary
consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observations, written in order from smallest to largest
standard deviation
measures the average distance of the observations from their mean.
variance
the average squared deviation