Individuals
are the objects described by a set of data. Individuals may be people, animals, or things; what you are measuring
Variable
any characteristic of an individual; can take different values for different individuals; characteristic being measured
Categorical variable
places an individual into one of several groups or categories
Quantitative variable
takes numerical values for which it makes sense to find an average
Distribution
of a variable tells what values the variables takes and how often it takes these values; how often and where the data occurs
Inference
drawing conclusions that go beyond the data at hand; a conclusion about a larger population based on a sample
Statistics
the science of data, collection, summarization, and interpretation
Population
everyone who is being studied
Discreet
countable numbers; jump in increments
Continuous
numbers in between, such as length, weight, and time (ex. We can't have half a baby)
Frequency Table/Distribution
displays the counts (frequencies) of individuals in each format category
Relative Frequency Table/Distribution
the data that shows the percents (relative frequencies) of individuals in each format
Roundoff error
each percent is rounded to the nearest tenth; exact percents would add up to 100, but this only comes close; does not point to mistakes in our work, just to the effect of rounding errors
Two-way table
describes two categorical variables
Marginal distribution
the distribution of values of that variable among all individuals described by the table
Conditional Distribution
of a variable describes the values of that variable among individuals who have a specific value of another variable; there is a separate conditional distribution for each value of the other variable
Association
between two variables if specific values of one variable tend to occur in common with specific values of the other
Simpson's Paradox
An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined
Dotplot
each data value is shown as a dot above its location on a number line
SOCS
Shape Outlier Center Spread
Symmetric
right and left sides of graph are approximately mirror images of each other
Skewed to the right
right side of graph is longer than left side
Skewed to the left
left side of graph is longer than right side
Unimodal
have a single peak distribution shape
Bimodal
have two clear peaks distribution shape
Multimodal
a distribution shape with more than two clear peaks
Stemplot
graphical display for small data sets
Mean
_
x ; is the sample mean; from a set of observations, add their values and divide by the number of observations. If the n observations are x1, x2, ...., xn their mean is (the sum of oberservations) / n
Median
M ; the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger
Interquartile range
(IQR) measures the range of the middle 50% of the data; IQR = Q3 - Q1
1.5 x IQR rule for outliers
Any values not falling between Q1 - 1.5 x IQR and Q3 + 1.5 x IQR are outliers
Five number summary
consist of the smallest observation, first quartile, median, third quartile, and largest observation; Minimum, Q1, M, Q3, Maximum
Resistant
relatively unaffected by extreme observations