What are the five important characteristics of data?
center, variation, distribution, outliers, time aka computer viruses destroy or terminate
What is a center?
a representative or average value that indicates where the middle of the data set is located
What is variation?
a measure of the amount that the data values vary among themselves?
What is distribution?
the nature or shape of the distribution of the data (such as bell-shaped, uniform or skewed)
What are outliers?
sample values that lie very far away from the vast majority of other sample values
What is time?
changing characteristics of the data over time
What is a method that's objective is to summarize or describe the important characteristics of a set of data?
descriptive statistics
What is a method that is used with sample data to make inferences (or generalizations) about a population that goes beyond the data?
inferential statistics
What is a table that lists data values (either individually or by groups of intervals), along with their corresponding frequencies (or counts)?
frequency distribution
What is the number of original values that fall into a particular class?
frequency
What are the smallest numbers that can belong to the different classes in a frequency distribution?
lower class limits
What are the largest numbers that can belong to the different classes in a frequency distribution?
upper class limits
What are the numbers used to separate classes, but without the gaps created by class limits? How are they found?
class boundaries/ divide gap between upper limit of one class and lower limit of next by 2 and then add number to upper class limit or subtract it from lower class limit
What are the midpoints of the classes in a frequency distribution?
class midpoints/ add the lower class limit to the upper class limit and divide the sum by 2
What is the difference between two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution?
class width
What are 3 reasons to construct frequency distributions?
To summarize a large data set, gain some insight into the nature of data and have a basis for constructing important graphs
What is found by dividing each class frequency by the total of all frequencies?
relative frequencies
What is the difference between a relative frequency and a frequency distribution?
it uses the same class limits as a frequency distribution but relative frequencies are used instead of actual frequencies and shown as percents
What is the sum of the frequencies for that class and all previous classes called?
cumulative frequency
What does the horizontal and vertical scale of a histogram represent?
bar graph with a horizontal scale representing classes of data values and the vertical scale representing actual frequencies
What do the heights of the bars on the histogram correspond to?
frequency values, and the bars are drawn adjacent to each other (without gaps)
What has the same shape and horizontal scale as a histogram, but its vertical scale is marked with relative frequencies?
relative frequency histogram
What uses line segments connected to points located directly above class midpoint values?
frequency polygon
What is a line graph that depicts cumulative frequency, just as the cumulative frequency distribution lists cumulative frequencies?
ogive
What is a graph where each data value is plotted as a point along a scale of values?
dotplot
Who saved lives with statistics by showing people that most soldiers died due to unsanitary hospitals?
Florence Nightingale
What represents data by separating each value into two parts: the stem (leftmost digit) and leaf (rightmost digit)?
stem-and-leaf plot
What is a bar graph for qualitative data with the bars arranged in order according to frequencies?
Pareto chart
What is a graph depicting qualitative data as slices of a pie?
pie chart
What is a plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis matching 2 diff data sets?
scatter diagram or scatterplot
What are data that have been collected at different points in time?
Time-series data
What is a value at the center or middle of a data set?
measure of center
What are the different ways to define the measure of center?
mean, median, mode and midrange
What is the measure of center found by adding the values and dividing the total by the number of values?
arithmetic mean
What does E mean?
the addition of a set of values?
What does x mean?
the variable usually used to represent the individual data values
What does n mean?
number of values in a sample
What does N mean?
number of values in a population
What is the measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude?
median
What is the value of a data set that occurs most frequently?
mode
When two values occur with the same greatest frequency the data set is? more than 2 values? no repeated value?
bimodal, multimodal, no mode
What is the measure of center that is the midway between the highest and lowest values in the original data set? How is it found?
midrange/ add the maximum and minimum value and then divide the sum by 2
What is the round-off rule?
carry one more decimal place than is present in the original set of values
What is a mean computed with the different values assigned different weights?
weighted mean (Ex. find average in a class)
What is it called if a distribution of data is not symmetric and extends more to one side than the other?
skewed (if not skewed it is symmetric)
What does it mean when something is skewed to the left?
it is negatively skewed and the mean and median are to the left of the mode, there is also a longer left tail
What does it mean when something is symmetric?
there is zero skewness and the mean, median and mode are the same
What does it mean when something is skewed to the right?
it is positively skewed and the mean and median are to the right of the mode, there is also a longer right tail
What is the difference between the maximum and minimum value in a set of data?
range
What is a measure of variation of values about the mean?
standard deviation, an average deviation of values from the mean
What is a measure of variation equal to the square of the standard deviation?
variance
What is s?
sample standard deviation
What is s squared?
sample variance
What is for a set of non-negative sample or population data, expressed as a percent and describes the standard deviation relative to the mean?
coefficient of variance
What is based on the principle that for many data sets the majority (95%) of sample values are within 2 standard deviations of the mean?
range rule of thumb
What can be used to roughly estimate standard deviation?
s = range/4
How can you find rough estimates of the minimum and maximum usual sample values?
minimum usual value = mean - (2 x std dev)
maximum usual value = mean + (2 x std dev)
What rule states that for data sets having a bell-shaped distribution 68% of it is within 1 std dev of the mean, 95% within 2 std dev of the mean and 99.7% within 3 std dev of the mean?
empirical rule
What theorem applies to any data set and says at least 75% of all values are within 2 std dev of the mean and 89% within 3 std dev of the mean?
Chebyshev's Theorem (Found by using 1 - 1/K squared)
What is the number of standard deviations that a given value x is above or below the mean?
standardized score or z score, round z to 2 decimal places
What do percentiles measure?
relative standing
What is the process of using statistical tools (such as graphs, measures of center and variation) to investigate data sets in order to understand their important characteristics?
exploratory data analysis
What is a value that is located very far away from almost all of the other values?
outlier