Elementary Statistics Chapter 2

Frequency Distribution

Table that shows classes or intervals of data entries with a count of the number of entries in each class

Lower Class Limit

Least number that can belong to a class

Upper Class Limit

Greatest number that can belong to a class

Class Width

Distance between lower and upper limits of a consecutive class
Ex: Lower limit 2 - lower limit 1 = class width
Upper limit 2- upper limit 1 = class width

Range

Difference between the maximum and minimum data entries

Midpoint

Sum of the lower and upper limits of the class divided by 2

Relative Frequency

The portion or percentage of the data that falls in that class
Relative Frequency = class frequency/sample size = f/n

Cumulative Frequency

Sum of the frequencies of that class and all previous classes
Cumulative frequency of the last class should equal the sample size

Frequency Histogram

A bar graph that represents the frequency distribution of a data set
1. The horizontal scale is quantitative and measure the data entries.
2. Vertical scale measures the frequencies of the classes
3. Consecutive bars much touch

Class boundaries

Numbers that separate classes without forming gaps between them

Frequency Polygon

A line graph that emphasizes the continuous change in frequencies

Relative Frequency Histogram

A bar graph that represents the relative frequency distribution of a data set

Cumulative Frequency Graph
Ogive

Line graph that displays the cumulative frequency of each class at its upper class boundary

Stem-and-leaf plot

each number is separated into a stem and leaf ; as many leaves as entries in original data set in single digits

Stem

Entry's leftmost digits

Leaf

Entry's rightmost digits

Dot Plot

Each data entry is plotted using a point above a horizontal axis

Pareto Chart

Vertical bar graph in which the height of each bar represents the frequency or relative frequency, positioned in order of decreasing height with the tallest bar at the left

Paired Data sets

When each entry in one data set corresponds to one entry in a second data set

Scatter Plot

Ordered pairs are graphed as points in a coordinate plane and shows a relationship between two quantitative variables

Time Series

Data set that is composed of quantitative entries taken at regular intervals

Time Series chart

Chart of the time series used to graph it
Year / Number / Average Number

Measure of central tendency

Value that represents a typical, or central, entry of a data set
Mean, median, and mode

Mean

Sum of the data entries divided by the number of entries

Median

Value that lies in the middle of the data when the data set is ordered

Mode

A data entry that occurs with the greatest frequency

Bimodial

When a data set has two entries occurring with the same greatest frequency

Outlier

The data entry that is far removed from the other data entries in a data set

Gaps

Space in a distribution cause by outliers

Weighted mean

Mean of a data set whose entries have varying weights
Ex: x bar = sigma(x * W)/sigma(W)
When finding your grade in a class with sections (like tests) weighted differently

Mean of a frequency distribution

x bar = sigma(x * f)/n
where x and f are midpoint and frequency of each class

Symmetric

when a vertical line can be drawn through the middle of a graph of the distribution and the halves are about mirror images

Uniform/rectangular

When all entries, or classes, in the distribution have equal or about equal frequencies

Skewed

When the "tail" of a graph elongates more to one side than the other

Negatively skewed

Skewed left; the tail extends to the left

Positively skewed

Skewed right; the tail extends to the left

Deviation

The difference between the entry and the mean of the data set

Sum of squares

Sum of the squares of the deviations

Population variance

Average of the squares of the deviations

Standard deviation

Square root of the variance

Sample Variance

Sample Standard Deviation

Empirical Rule

68-95-99.7 Rule;
For data sets with distributions that are about symmetric and bell shaped
1. 68% lie within one standard deviation
2. 95% lie within 2 standard deviations
3. 99.7% lie within 3 standard deviations

Chebychev's Theorem

Gives an inequality statement that applies to all distributions
1. The portion of any data set lying within k standard deviations of the mean is at least 1-(1/k^2)

Fractiles

Numbers that partition, or divide, an ordered data into equal parts

Quartiles

Divide an ordered data set into four equal parts

First Quartile

Q1; 1/4 of data fall on or below

Second Quartile

Q2; 1/2 of data fall on or below
Same as median

Third Quartile

Q3; 3/4 of data fall on or below

Interquartile Range

A measure of variation that gives the range of the middle portion (about half) of the data
IQR= Q3-Q1

Box-and-whisker-plot
Boxplot

Exploratory data analysis tool that highlights the important features of a data set

Five number summary

1. The minimum entry
2. First quartile
3. Median (Q2)
4. Third Quartile
5. Maximum entry