Math 2342

Frequency distribution

The organization of raw data in table form, using classes and frequencies.

Categorical frequency distribution

Is used for data that can b placed in specific categories, such as nominal, or ordinal-level data. Like political affiliation, religious affiliation, or major field of study would use this.

Cumulative frequency distribution

Distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary).

Group frequency distribution

When a range of the data is large, the data must be grouped into classes that are more than one unit in width.

Ungrouped frequency distribution

When the range of the data values is relatively small, a frequency distribution can be constructed using single data values for each class.

Histogram

A graph that displays the data by using contiguous vertical bars (unless the frequency of a class of 0) of various heights to represent the frequencies of the classes.

Frequency polygon

A graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the class. The frequencies are represented by the heights of the points.

Ogive

A graph that represents the cumulative frequencies for the classes in a frequency distribution.

Bar graph

Represents the data by using vertical or horizontal bars whose heights or lengths represent frequencies of the data.

Pareto chart

Used to represent a frequency distribution for a categorical variable and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest.

Time series graph

Represents data that occur over a specific period of time.

Pie graph

A circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution.

Stem ad leaf plot

A data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes.

Scatter plot

A graph of order pairs of data values that is used to determine if a relationship exists between the two variables.

Statistic

A characteristic or measure obtained by using the data values from a sample.

Parameter

A characteristic or measure obtained by using all the data values from a specific population.

Mean

The sum of the values, divided by the total number of values. X with a __ on top is the symbol of sample and Greek letter (mu) is used for population. N = total number of values in the population.

Median

The midpoint of the data array - symbol = MD

Mode

The value that occurs most often in the data set.

Midrange

The sum of the lowest and highest values in the data set, divided by 2. Symbol = MR

Positively skewed or right-skewed distribution

The majority of the data values fall to the left of the mean and luster at the lower end of the distribution; the "tail" is to the right. Also the mean is to the right of the median, and the mode is to the left of the median.

Symmetric distribution

The data values are evenly distributed on both sides of the mean. In addition, when the distribution is unimodal, the mean, median, and mode are the same and are at the center of the distribution.

Negatively skewed or left-skewed

When the majority of the data values fall to the right of the mean and cluster at the upper end of the distribution, with the tail to the left. Also, the mean is to the left of the median, and the mode is to the right of the median.

Range

The highest value minus the lowest value. Symbol = R

Variance

The average of the squares of the squares of the distance each value is from the mean. Symbol = Greek lowercase letter sigma o with a tail on top right with a 2 on top (squared). X = individual value; (mu) population mean & N= population size.

Standard deviation

The square root of the variance. symbol = sigma o with a tail on top right

Sample variance

Symbol = s squared; X with line on top = sample mean; n = sample size

Standard deviation of a sample

Symbol = s; X = individual value; x with line on top = sample mean; n = sample size

Coefficient of variation

Denoted by CVar is the standard deviation divided by the mean. The result is expressed as a percentage. For samples: CVar = s/X
100; for populations: CVar = sigma/mu
100

Range Rule of Thumb

A rough estimate of the standard deviation is s ~ range/4

Chebyshev's theorem

The proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1 --1/k sq, where k is a number greater than 1 (k is not necessarily an integer).

The Empirical Rule

Approx 68% of the data values will fall within 1 standard deviation of the mean.
Approx 95% of the data values will fall within 2 standard deviations of the mean.
Approx 99.7% of the data values will fall within 3 standard deviations of the mean.

z score or standard score

The score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol = z; formula: z = value - mean / standard deviation; for samples: z = x - x line on top / s; for population: z = x - mu

Percentiles

Divide the data set into 100 equal groups. It corresponds to a given value X.

Quartiles

Divide the distribution into 4 groups separated by Q1, Q2, Q3; note: Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile or the median; Q3 corresponds to the 75th percentile.

Interquartile range (IQR)

The difference between Q1 and Q3 and is the range of the middle 50% of the data.

Outlier

An extremely high or and extremely low data value when compared with the rest of the data values.

Five-number summary

Can be used to graphically represent the data set. These plots involve 5 specific values: 1) the lowest value of the data set (Min); 2) Q1; 3) the median; 4) Q3; 5) the highest value of the data set (Max)

Boxplot

A graph of a data set obtained by drawing a horizontal line from the min data value to Q1, drawing a horizontal line from Q3 to the Max data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing t

Resistant statistic

Relatively less affected by outliers than nonresistant statistic. The mean and standard deviation are nonresistant statistics.