Statistics
the science of studying data
Individuals (or subjects)
_ are the objects described by a set of data. _ may be people, animals, or things.
Population
A _ describes the set of all individuals/subjects of interest.
Sample
A _ describes the set of individuals'subjects actually selected for measurement. (A subset of a population)
Variable
A _ is any characteristic of an individual. A _ may take different values for different individuals.
categorical variable
A _ places an individual into one of several categories.
Examples of categorical variables:
Class year, eye color, state of residence, favorite restaurant.
numerical (quantitative) variable
A _ takes numerical values for which operations like adding and averaging make sense.
Example of numerical (quantitative variables):
Boone weather
The types of quantitative variables:
discrete and continuous
Discrete variables
These variables have a finite number of outcomes possible; generally only integer outcomes are possible. ex) number of lightbulbs produced per hour, number of students in a class.
Continuous variables
In _ variables, fractional values are possible. Examples include a person's weight, time for elevator to arrive, anything measured with a stopwatch.
Is a zip code categorical or quantitative?
categorical - it tells you about a location.
Is a year (2015, etc) categorical or quantitative?
quantitative - you can do mathematical problems with the year.
Bar graph or pie chart
What would be the best graph(s) to use for one categorical variables?
Bar graph with categories
What would be the best graph(s) to use for two categorical variables?
Dotplot, stem & leaf plot, histogram, and boxplot
What would be the best graph(s) to use for one quantitative variables?
scatterplot
What would be the best graph(s) to use for two quantitative variables?
Side-by-side boxplot
What would be the best graph(s) to use for one categorical variable and one quantitative variable?
Categorical variables
to graph _, we use frequency tables, bar graphs, or pie charts.
Numerical (quantitative) variables
to graph _, we use histograms, stem and leaf plots, and dotplots.
Stem and leaf plots
_ are mainly useful for small data sets (n < 15) and the variable is discrete.
Dotplots
_ are useful for moderate sized data sets (n < 100), regardless of whether the variable is discrete or continuous.
Histograms
_ are best for larger data sets (n > 100) and also work well for both discrete and continuous variables.
overall pattern, deviations from that pattern
To describe a distribution, we look for _, and any striking _.
shape, center, and spread
The overall pattern of a histogram, or stem/leaf plot, can be described by the _, _, and _.
outlier
A deviation in which the individual value falls outside the overall pattern.
shape
the _ of a distribution can be described using words like symmetric, skewed to the right, skewed to the left, bimodal, outlier.
outlier
An _ is an individual observation falling far outside the overall pattern of the distribution.
symmetric
a distribution is _ if the right and left sides of the histogram are approximately mirror images of each other.
skewed to the right
a distribution is called _ if the right side of the histogram extends much father out than the left side.
skewed to the left
a distribution is _ if the left side of the histogram extends much father out than the right side.
bimodal
A distribution is called _ if the distribution has two separate peaks.
symmetry, skew, outliers
the characteristics used to describe the shape of a distribution include:
mean, median
the characteristics used to describe the center of a distribution include:
IQR, standard deviation
the characteristics used to describe the spread of a distribution include
mean
the average of a set of n observations
median
_ is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger
mean
when the data is reasonable symmetric with no outliers, the _ is preferred.
median
when the data has notable skew and/or outliers, the _ is preferred.
IQR and standard deviation
the two common measures of the spread of a distribution is the IQR and the standard deviation
five-number summary
the _ is a summary of a dataset containing the following: min value, Q1, Q2, Q3, and max value.
boxplots
_ are a graphical picture of the five-number summary.
standard deviation
_ looks at the typical variation from the mean (regardless of direction) denoted as s.
small standard deviation
consistent data will have a _ standard deviation
large standard deviation
considerable variation in the data will have a _ standard deviation
standard deviation
_ is affected by outliers.
z-score (standardized score)
standard deviations give a method of computing relative standings called a _. A _ tells how many standard deviations above or below the average that an observation lies.