mean
arithmetic average; calculated by summing the observations and then dividing by the number of observations; "balance" of the distribution"; not a resistant measure of center (affected by outliers)
median
midpoint of a distribution ((n+1)/2) = value of the median; resistant measure of center
first quartile (Q1)
value in the sample that has 25% of the data at or below it; median of the observations to the left of M
second quartile (Q2)
median
third quartile (Q3)
the value in the sample that has 75% of the data at or below it; median of the observations to the right of M
interquartile range (IQR)
measure of spread; IQR = Q3 - Q1
five number summary of a distribution
minimum (smallest observation), Q1, M (median), Q3, maximum (largest observation)
boxplot
graphical display of the five-number summary; central box spans the middle 50% of the data (marked by the first and third quartiles); line in the box marks the median; lines extend from the box to the smallest and largest observations; will show the skew
modified boxplot
similar to a boxplot, but it shows the suspected outliers as dots
suspected outlier
falls more than 1.5 IQRs away from either Q1 or Q3; lies below Q1-1.5(IQR) or above Q3+1.5(IQR)
standard deviation
used to describe the variation around the mean; not resistant (impacted by skewness and outliers); positive square root of the variance (s^2), which is the average of the squares of deviations of the observations; should only be used when the mean is appr
is the five-number summary resistant to strong outliers?
Yes! The median is a resistant measure of center, while the IQR is a resistant measure of spread; mean and standard deviation should be used for reasonably symmetric distribution