frequency table/relative frequency table
a table that lists the categories in a categorical variable and gives the count or percentage of observations for each category
distribution
includes the possible values of the variable and/or the relative frequency of each value
area principle
each data value should be represented by the same amount of area
bar chart
a graphical display representing the count of each category in a categorical variable
pie chart
show how a "whole" divides into categories by showing a wedge of a circle whose are corresponds to the proportion in each category
contingency table
displays count and, sometimes, percentages of individuals falling into named categories on two or more variables.
marginal distribution
the distribution of either variable alone; the counts or percentages are the totals found in the margins (last row or column of the table
conditional distribution
distribution of a variable restricting the Who to consider only a smaller group of individuals
Simpson's paradox
when averages are taken across different groups, they can appear to contradict the overall averages
distribution
possible values of the variable; the frequency or relative frequency of each value
histogram
a graphical display uses adjacent bars to show the distribution of values in a quantitative variable; each bar represents the frequency
stem-and leaf display
shows quantitative data values in a way that sketches the distribution of the data.
dotplot
graphs a dot for each case against a single axis
center
a value that attempts the impossible by summarizing the entire distribution with a single number, a "typical" value
spread
a numerical summary of how tightly the values are clustered around the center
mode
a hump or local high in the shape of the distribution of a variable
unimodal
a distribution with one mode
uniform
a distribution that's a roughly flat
symmetric
a distribution with two halves on either side of the center look approximately lie mirror images of each other
skewed
one tail stretches out farther than the other
skewed right
distribution whose longer tail stretches to the right
skewed left
distribution whose longer tail stretches to the left
outliers
extreme values that don't appear to belong with the rest of the data; they may be unusual values that deserve further investigation or just a mistake
timeplot
displays data that change over time; successive values are connected with lines to show trends more clearly
center
mean or median
median
middle value with half of the data above and half above it
spread
standard deviation, interquartile range, and range
range
the difference between the lowest and highest values in a data set
lower quartile (Q1)
25% of the data lie below it
upper quartile (Q2)
75% of the data lie below it
Interquartile Range (IQR)
the difference between the first and third quartile: IQ=Q3-Q1
5-number summary
consists of the minimum and maximum, the quartiles: Q1, Q3 and the median
boxplot
displays the 5-number summary as a central box, with whiskers that extend to the non-outlying data values; effective for comparing groups
mean
found by summing all the data values and dividing by the count
variance
sum of squared deviations from the mean, divided by the count minus one
standard deviation
square root of the variance
comparing distribution
considers the shape, center, spread and outlier
comparing boxplots
includes comparing the medians, IQRs, size of IQRs, outliers