chapter 3.1 & 3.2 & 3.3 | Mean And Standard Deviation

Average

Is Ambiguous, since several different methods can be used to obtain an average
Loosely stated, the average means the center of the distribution or the most typical case
Measures of Average are also called the Measures of Central Tendency (Section 3-2)
Mea

Measures of Center (Central Tendency)

Summarize data using measures of central tendency, such as the mean, median, mode, and midrange

Population

Consists of all subjects (human or otherwise) that are being studied

Sample

A group (subgroup) of subjects randomly selected from a population

Parameter

A characteristic or numerical measurement obtained by using all the data values from a specified

Statistic

A characteristic or numerical measurement obtained by using the data values from a sample

Meet the M&M's

Mean
Median
Mode
Midrange

Mean

Aka Arithmetic Average
Is found by adding the data values and dividing by the
total number of values
In general, mean is the most important of all numerical measurements used to describe data
Is what most people call an "average"
Is unique and in most cas

Median

Is the middle value when the raw data values are arranged in order from smallest to largest or vice versa
Is used when one must find the center or midpoint of a data set
Is used when one must determine whether the data
values fall into the upper half or l

Mode

Is the data value(s) that occurs most often in a data set
Sometimes said to be the most typical case
Is the easiest average to compute
Cane be used when the data are nominal, such as
religious preference, gender, or political affiliation
Is not always uni

Midrange

Is a rough estimate of the midpoint for the data set
Is found by adding the lowest and highest data values and dividing by 2
Is easy to compute
Gives the midpoint
Is affected by extremely high or low data values
Is rarely used

Insights

Since measures of central tendency are equal, one might conclude that neither customer waiting line system is better.
But, if examined graphically, a somewhat different conclusion might be drawn. The waiting times for customers at Branch B (multiple lines

Measures of Variation

Range
Variance
Standard Deviation

Range

Range is the simplest of the three measures
Range is the highest value (maximum) minus the lowest value (minimum)

Variance

Variance is an "unbiased estimator" (the variance for a sample tends to target the variance for a population instead of systematically under/over estimating the population variance)

standard deviation

Is the square root of the variance (gives the same units as raw data)
Provides a measure of how much we might expect a
typical member of the data set to differ from the mean.
The greater the standard deviation, the more the data is "spread out"
Standard d

Empirical (Normal) Rule

Only applies to bell-shaped (normal) symmetric distributions
Used to estimate the percentage of values within a few standard deviations of the mean

Z score or Standard Score

Can be used to compare data values from different data sets by "converting" raw data to a standardized scale
Calculation involves the mean and standard deviation of the data set
Represents the number of standard deviations that a data
value is from the me

Z-Score

Is obtained by subtracting the mean from the given data value and dividing the result by the standard deviation.
Symbol of BOTH population and sample is z
Can be positive, negative or zero

Percentiles

Are position measures used in educational and health-related fields to indicate the position of an individual in a group
Divides the data set in 100 ("per cent") equal groups
Used to compare an individual data value with the national "norm"
Symbolized by

Quartiles

Same concept as percentiles, except the data set is divided into four groups (quarters)
Quartile rank indicates the percentage of data values that fall below the specified rank
Symbolized by Q1 , Q2 , Q3
Equivalencies with Percentiles

Interquartile Range

Rough" measurement of variability
Used to identify outliers
Used as a measure of variability in Exploratory Data Analysis
Defined as the difference between Q1 and Q3
Is range of the middle 50% ("average") of the data set

Outliers

is an extremely high or an extremely low data value when compared with the rest of the data values
A data set should be checked for "outliers" since "outliers" can influence the measures of central tendency and variation (mean and standard deviation)