Ch 2 Stat

frequency distribution

a table that shows classes, or intervals of data entries with a count of the number of entries in each class; the frequency (f) of a class is the number of data entries in the class

lower class limit

the least number that can belong to the class

upper class limit

greatest number that can belong to the class

class width

the distance between lower (or upper) limits of consecutive classes

range

difference between the max and min data entries

mispoint

class mark; average

relative frequency

the portion of percentage of data that falls in that class; to get the ____ of a class, divide the frequency by the sample size

cumulative frequency

the sum of the frequencies of that class and all previous classes; the ____ of the last class is equal to the sample size (n)

frequency histogram

a bar graph that represents the frequency distribution of a data set; has three properties:
1. the horizontal scale is quantitative and measures the data values
2. the vertical scale measures the frequencies of the classes
3. the consecutive bars must tou

class boundaries

the numbers that separate classes without forming gaps between them

frequency polygon

is a line graph that emphasizes the continuous change in frequencies

relative frequency histogram

has the same shape and the same horizontal scale as the corresponding frequency histogram

cumulative frequency graph

ogive; a line graph that displays the cumulative frequency of each class at its upper class boundary; the upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on the vertical axis

stem and leaf plot

example of exploring data analysis; each number is separated into a stem and a leaf; similar to a histogram, but has the advantage that the graph still contains the original data values; also an easy way to sort data

dot plot

(to graph quantitative data) each entry is plotted, using a point, above a horizontal axis; like a stem & leaf plot, a ___ allows you to see how data is distributed, determine specific data entries, and identify unusual data values

pie charts

provide a convenient way to present qualitative data graphically as percents of a whole; a chart divided into sectors that represent categories; the area of each sector is proportional to the frequency of each category; in most cases you will be interpret

Pareto chart

another way to graph qualitative data; is a vertical bar graph in which the height of each bar represents frequency or relative frequency; the bars are positioned in order of decreasing height, with the tallest bar positioned at the left; helps highlight

paired data sets

when each entry in one data set corresponds to one entry in a second data set

scatter plot

a way to graph paired data sets; where the ordered pairs are graphed as points in a coordinate plane; used to show the relationship between two quantitative variables

time series

data set that is composed of quantitative entries taken at regular intervals over a period of time

measure of central tendency

a value that represents a typical, or central, entry of a data set; commonly used: mode, mean, and median

mean

average

population mean

...

sample mean

...

median

data that lies in the middle of the data when that data set is ordered; it measures the center of an ordered data set by dividing it into two equal parts; if the data set is even, the median is the mean of the two numbers in the center

mode

data entry that occurs the greatest frequency; if two entries occur the same greatest frequency, each entry is a ___ and the data set is called bimodal

outliers

a data entry that is far removed from the other entries in the data set; usually greatly affects the mean; conclusions that contain ___ may be flawed

gaps

when a data set can have one or more outliers causing ___ in a distribution

mean of frequency distribution

...

symmetric

when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately mirror images

uniform (rectangular)

when all entries, or classes, in the distribution have equal of approximately equal frequencies; a ____ distribution is also symmetric

skewed left (negatively skewed)

the tail extends to the left

skewed right (positively skewed)

the tail extends to the right

range

is the difference between the max and min data entries in the set; the data must be quantitative to find the ___

deviation

of an entry "x" in a population data set- is the difference between the entry and the mean ? of the data set

sum of squares

when you add the squares of the deviations (ss)

population variance

the mean of the squares of deviations in a population data set (N)

population standard deviation

of a population data set (N) entries- is the square root of the population variance

sample variance

...

sample standard deviation

...

empirical rule

for data with a symmetric bell-shaped distribution, the standard deviation has the following characteristics
1. about 68% of the data lie within one standard deviation of the mean
2. about 95% of the data lie within two standard deviations of the mean
3.

Chebychev's theorem

the portion of any data set lying within k standard deviation (k>1) of the mean is at least :

fractiles

numbers that partition, or divide, and ordered data set into equal parts

quartiles

Q1, Q2, and Q3 approximately divide and ordered data set into 4 equal parts
- about 1/4 of the data fall on or below the 1st quartile
- about 1/2 of the data fall on of below the 2nd quartile
^ same as the median of the data set
- about 3/4 of the data fa

inter-quartile range (IQR)

is a measure of variance that gives the range of the middle 50% of the data; the difference between the third and first quartiles
IQR= Q3-Q1

box-and-whisker plot (box plot)

and exploratory data analysis tool that highlights the important features of a data set; to graph a box plot, you must know the following values:
1. the min entry
2. the 1st quartile Q1
3. the median Q2
4. the 3rd quartile Q3
5. the max entry
^these numbe

standard score (z-score)

represents the number of standard deviations a given value "x" falls from the mean ?; __ can be negative/positive/or zero; can be used to identify an unusual value of a data set that is approximately bell-shaped

if the z-score is negative

the corresponding x-value is less than the mean

if the z-score is positive

the corresponding x-value is equal to the mean

Hawthorne effect

occurs in an experiment when subjects change their behavior simply because they know they are participating in an experiment.