Chapter 2 (Describing Distributions with Numbers)

mean

arithmetic average; calculated by summing the observations and then dividing by the number of observations; "balance" of the distribution"; not a resistant measure of center (affected by outliers)

median

midpoint of a distribution ((n+1)/2) = value of the median; resistant measure of center

first quartile (Q1)

value in the sample that has 25% of the data at or below it; median of the observations to the left of M

second quartile (Q2)

median

third quartile (Q3)

the value in the sample that has 75% of the data at or below it; median of the observations to the right of M

interquartile range (IQR)

measure of spread; IQR = Q3 - Q1

five number summary of a distribution

minimum (smallest observation), Q1, M (median), Q3, maximum (largest observation)

boxplot

graphical display of the five-number summary; central box spans the middle 50% of the data (marked by the first and third quartiles); line in the box marks the median; lines extend from the box to the smallest and largest observations; will show the skew

modified boxplot

similar to a boxplot, but it shows the suspected outliers as dots

suspected outlier

falls more than 1.5 IQRs away from either Q1 or Q3; lies below Q1-1.5(IQR) or above Q3+1.5(IQR)

standard deviation

used to describe the variation around the mean; not resistant (impacted by skewness and outliers); positive square root of the variance (s^2), which is the average of the squares of deviations of the observations; should only be used when the mean is appr

is the five-number summary resistant to strong outliers?

Yes! The median is a resistant measure of center, while the IQR is a resistant measure of spread; mean and standard deviation should be used for reasonably symmetric distribution