Unit 1 Stats

frequency table/relative frequency table

a table that lists the categories in a categorical variable and gives the count or percentage of observations for each category

distribution

includes the possible values of the variable and/or the relative frequency of each value

area principle

each data value should be represented by the same amount of area

bar chart

a graphical display representing the count of each category in a categorical variable

pie chart

show how a "whole" divides into categories by showing a wedge of a circle whose are corresponds to the proportion in each category

contingency table

displays count and, sometimes, percentages of individuals falling into named categories on two or more variables.

marginal distribution

the distribution of either variable alone; the counts or percentages are the totals found in the margins (last row or column of the table

conditional distribution

distribution of a variable restricting the Who to consider only a smaller group of individuals

Simpson's paradox

when averages are taken across different groups, they can appear to contradict the overall averages

distribution

possible values of the variable; the frequency or relative frequency of each value

histogram

a graphical display uses adjacent bars to show the distribution of values in a quantitative variable; each bar represents the frequency

stem-and leaf display

shows quantitative data values in a way that sketches the distribution of the data.

dotplot

graphs a dot for each case against a single axis

center

a value that attempts the impossible by summarizing the entire distribution with a single number, a "typical" value

spread

a numerical summary of how tightly the values are clustered around the center

mode

a hump or local high in the shape of the distribution of a variable

unimodal

a distribution with one mode

uniform

a distribution that's a roughly flat

symmetric

a distribution with two halves on either side of the center look approximately lie mirror images of each other

skewed

one tail stretches out farther than the other

skewed right

distribution whose longer tail stretches to the right

skewed left

distribution whose longer tail stretches to the left

outliers

extreme values that don't appear to belong with the rest of the data; they may be unusual values that deserve further investigation or just a mistake

timeplot

displays data that change over time; successive values are connected with lines to show trends more clearly

center

mean or median

median

middle value with half of the data above and half above it

spread

standard deviation, interquartile range, and range

range

the difference between the lowest and highest values in a data set

lower quartile (Q1)

25% of the data lie below it

upper quartile (Q2)

75% of the data lie below it

Interquartile Range (IQR)

the difference between the first and third quartile: IQ=Q3-Q1

5-number summary

consists of the minimum and maximum, the quartiles: Q1, Q3 and the median

boxplot

displays the 5-number summary as a central box, with whiskers that extend to the non-outlying data values; effective for comparing groups

mean

found by summing all the data values and dividing by the count

variance

sum of squared deviations from the mean, divided by the count minus one

standard deviation

square root of the variance

comparing distribution

considers the shape, center, spread and outlier

comparing boxplots

includes comparing the medians, IQRs, size of IQRs, outliers