Stat 231 Ch. 2 terms

Frequency Distribution

a table that shows classes or intervals of data entries with a count of the number of entries in each class.
*Helps organize data so it is easier to interpret and construct graphs.

Frequency

(?���) is the number of data entries in the class.

Lower Class Limits

smallest value of each class

Upper Class Limits

largest value of each class

Class Width

the difference between two consecutive LCLs
*stays the same down the table
*never the difference within a class

Range

the difference between the maximum and minimum data entries.
(?��??���?���?���?��� = ?��??���?���?���?���?���?��� ?�� ?���?���?���?���?���?���?���)

Class Midpoints

is the sum of the lower and upper limits of the class divided by two. The midpoint is sometimes called
the class mark.

Cumulative Frequency

for each class, it is the frequency of that class added to the frequency of all previous classes

Relative Frequency

is the portion, or percentage, of the data that falls in that class. To find relative frequency of a
class, divide the frequency ?��� by the sample size ?���.

Frequency Histogram

is a bar graph that represents the frequency distribution of a data set.
*The horizontal scale is quantitative and measures data entries.
*The vertical scale measures the frequencies of the classes.
*Consecutive bars must touch.

Class Boundaries

the value between the end of one class and the beginning of the next.
*Will be one decimal place further than the given classes
*Must account for the boundary before 1st LCL and last UCL

Frequency Polygon

a line graph that emphasizes the continuous change in frequencies.

Relative Frequency Histogram

has the same shape and the same horizontal scale as the corresponding frequency
histogram.
*the vertical scale measures the relative frequencies, not frequencies.

Cumulative Frequency Graph or Ogive

is a line graph that displays the cumulative frequency of each class at its upperclass boundary.
*The upper boundaries are marked on the horizontal axis
*The cumulative frequencies are marked on the vertical axis.

Stem-and-leaf plot

each number is separated into a stem and a leaf.
*like a histogram
*Still contains original data values.

Dot plot

each data entry is plotted, using a point, above a horizontal axis.

Pie Chart

a circle is divided into sectors that represent categories.
*The area of each sector is proportional to the frequency of each category

Pareto Chart

a vertical bar graph in which the height of each bar represents frequency or relative frequency.
*bars are positioned in descending order.

Paired Data Sets

Each entry in one data set corresponds to one entry in a second data set.
* Graph using a scatter plot.
* Used to show the relationship between two quantitative variables.

Time Series

Data set is composed of quantitative entries taken at regular intervals over a period.
*Use a time series chart to graph

Measure of central tendency

a value that represents a typical, or central, entry of a data set.

1. Mean
2. Median
3. Mode

Three most commonly used measures of central tendency:

Mean

of a data set is the sum of the data entries divided by the number of entries.

Median

The value that lies in the middle of the data when the data set is ordered.
*Measures the center of an ordered data set by dividing it into two equal parts
*If the data set has an odd number of entries, the median is the middle data entry.
*If the data se

Mode

is the data entry that occurs with the greatest frequency.
*data set can have one mode, more than one mode, or no mode.
*is the only measure of central tendency that can be used to describe data at the nominal level of
measurement.

Outlier

is a data entry that is far removed from the other entries in the data set.
*some outliers are valid data; other outliers may occur due to data-recording errors
*a data set can have one or more outliers, causing gaps in a distribution.
*conclusions drawn

Weighted Mean

the mean of a data set whose entries have varying weights.

Symmetric

when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves
are approximately mirror images.
**When a distribution is symmetric and unimodal, the mean, median, and mode are equal.

Uniform

when all entries, or classes, in the distribution have equal or approximately equal frequencies.
*Is also symmetric

Skewed left

when its tail extends to the left.
*when its tail extends to the left.

Skewed right

when its tail extends to the right.
*When a distribution is skewed left, the mean is greater than the median and the median is usually greater than
the mode.

Measure of variation

a value that represents the spread or dispersion of data values in a data set.

1. Range
2. Variance
3. Standard deviation

Three most commonly used measures of variation:

Range

the difference between the maximum and minimum data entries in the set.
?���?���?���?���?��� = ?���?���?��??���?���?��??��� ?���?���?��??��� ?���?���?��??���?��? ?�� ?���?���?���?���?���?��??��� ?���?���?��??��� ?���?���?��??���y

Deviation

is the difference between the entry and the mean ?��� of the data set.
?��??���?��??���?���?��??���?���?��� ?���?��� ?��? = ?��? ?�� ???

Standard deviation

shows how much dispersion the data values are from the mean
- The smaller the standard deviation, the closer the values are to the mean
- The larger the standard deviation, the further apart the values are from the mean
- Standard deviation is never negat

Fractiles

are numbers that partition (divide) an ordered data set into equal parts.

Percentiles

denoted ?���1,?���2, ... , ?���99
which divide a set of data in 100 groups with about 1% of the values in each group.

Quartiles

denoted ?���1,?���2, ?���?���?��� ?���3 ,
which divide a set of data in 4 groups with about 25% of the values in each group.

Interquartile range (IQR)

is a measure of variation that gives the range of the middle portion of the data.

Box-and-whisker plot (boxplot)

an exploratory data analysis toll that highlights the important features of a data set.
To graph a boxplot, you need to find the five-number summary.

1. The minimum entry
2. The first quartile
3. The second quartile (median)
4. The third quartile
5. The maximum entry

Five-number Summary:

Standard Score (Z-score)

represents the number of standard deviations a given value falls from the mean ?���.

Z-score

*can be used to identify an unusual value of a data set that is approximately bell-shaped
*can be negative, positive, or zero