Statistics Exam 1 | Statistics

Statistics

the science of studying data

Individuals (or subjects)

_ are the objects described by a set of data. _ may be people, animals, or things.

Population

A _ describes the set of all individuals/subjects of interest.

Sample

A _ describes the set of individuals'subjects actually selected for measurement. (A subset of a population)

Variable

A _ is any characteristic of an individual. A _ may take different values for different individuals.

categorical variable

A _ places an individual into one of several categories.

Examples of categorical variables:

Class year, eye color, state of residence, favorite restaurant.

numerical (quantitative) variable

A _ takes numerical values for which operations like adding and averaging make sense.

Example of numerical (quantitative variables):

Boone weather

The types of quantitative variables:

discrete and continuous

Discrete variables

These variables have a finite number of outcomes possible; generally only integer outcomes are possible. ex) number of lightbulbs produced per hour, number of students in a class.

Continuous variables

In _ variables, fractional values are possible. Examples include a person's weight, time for elevator to arrive, anything measured with a stopwatch.

Is a zip code categorical or quantitative?

categorical - it tells you about a location.

Is a year (2015, etc) categorical or quantitative?

quantitative - you can do mathematical problems with the year.

Bar graph or pie chart

What would be the best graph(s) to use for one categorical variables?

Bar graph with categories

What would be the best graph(s) to use for two categorical variables?

Dotplot, stem & leaf plot, histogram, and boxplot

What would be the best graph(s) to use for one quantitative variables?

scatterplot

What would be the best graph(s) to use for two quantitative variables?

Side-by-side boxplot

What would be the best graph(s) to use for one categorical variable and one quantitative variable?

Categorical variables

to graph _, we use frequency tables, bar graphs, or pie charts.

Numerical (quantitative) variables

to graph _, we use histograms, stem and leaf plots, and dotplots.

Stem and leaf plots

_ are mainly useful for small data sets (n < 15) and the variable is discrete.

Dotplots

_ are useful for moderate sized data sets (n < 100), regardless of whether the variable is discrete or continuous.

Histograms

_ are best for larger data sets (n > 100) and also work well for both discrete and continuous variables.

overall pattern, deviations from that pattern

To describe a distribution, we look for _, and any striking _.

shape, center, and spread

The overall pattern of a histogram, or stem/leaf plot, can be described by the _, _, and _.

outlier

A deviation in which the individual value falls outside the overall pattern.

shape

the _ of a distribution can be described using words like symmetric, skewed to the right, skewed to the left, bimodal, outlier.

outlier

An _ is an individual observation falling far outside the overall pattern of the distribution.

symmetric

a distribution is _ if the right and left sides of the histogram are approximately mirror images of each other.

skewed to the right

a distribution is called _ if the right side of the histogram extends much father out than the left side.

skewed to the left

a distribution is _ if the left side of the histogram extends much father out than the right side.

bimodal

A distribution is called _ if the distribution has two separate peaks.

symmetry, skew, outliers

the characteristics used to describe the shape of a distribution include:

mean, median

the characteristics used to describe the center of a distribution include:

IQR, standard deviation

the characteristics used to describe the spread of a distribution include

mean

the average of a set of n observations

median

_ is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger

mean

when the data is reasonable symmetric with no outliers, the _ is preferred.

median

when the data has notable skew and/or outliers, the _ is preferred.

IQR and standard deviation

the two common measures of the spread of a distribution is the IQR and the standard deviation

five-number summary

the _ is a summary of a dataset containing the following: min value, Q1, Q2, Q3, and max value.

boxplots

_ are a graphical picture of the five-number summary.

standard deviation

_ looks at the typical variation from the mean (regardless of direction) denoted as s.

small standard deviation

consistent data will have a _ standard deviation

large standard deviation

considerable variation in the data will have a _ standard deviation

standard deviation

_ is affected by outliers.

z-score (standardized score)

standard deviations give a method of computing relative standings called a _. A _ tells how many standard deviations above or below the average that an observation lies.