Frequency Distribution
Lists each category of data and the number of occurrences of each category of data.
Relative frequency
Is the proportion (or percent) of observations within a category and is found using the formula
=frequency/sum of all frequencies
Relative frequency distribution
Lists each category of data together with the relative frequency
Pareto chart
Is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency
Pie chart
Is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category
Lower class limit (of a class)
Is the smallest value within the class
Upper class limit (of a class)
Is the largest value within the class.
Class width
Difference between consecutive lower class limits
= largest data value - smallest data value / number of classes
Histogram
Is constructed by drawing rectangles of each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.
Bar chart
Is constructed by labeling each category on the other axis. Rectangles of equal width are drawn for each category's frequency or relative frequency.
Statistics
Is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, it is about providing a measure of confidence in any conclusion.
Describe the relationship between population, sample, and individual.
The entire group of individuals to be studied is called the population. An individual is a person or object that is a member of the population being studied. A sample is a subset of the population that is being studied.
Descriptive statistics
Consists of organizing and summarizing data. Describes data through numerical summaries, tables, and graphs
Inferential statistics
Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result
Statistic
Numerical summary of a sample
Parameter
Sample of an entire population
Qualitative variables
Allows for classification of individuals based on some attribute or characteristic i.e.) gender
Quantitative variables
Provide numerical measures of individuals. Arithmetic operations such as addition and subtraction can be performed on the values of the quantitative variable and will provide meaningful results. i.e.) temperature
Discrete variable
Is a quantitative variable that had either a finite number of possible values or a countable number of possible values.
Continuous variable
Is a quantitative variable that has an infinite number of possible values that are not countable
Nominal level of measurement
If the values of the variable name, label, or categorize. In addition, the naming scheme allows for the values of the variable to be arranged in a ranked or specific order. i.e.) gender, it only allows for categorization of male or female. Plus, it is not
Ordinal level of measurement
Has properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order. i.e.) letter grade, values of the variable can be ranked, but differences in values have no meanin
Interval level of measurement
Has properties of the ordinal level of measurement and the differences in the values of the variable have meaning. A value of zero does not mean the absence of quantity. Arithmetic operations such as addition and subtraction can be preformed on values of
Ratio level of measurement
Has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero means the absence of quantity. Arithmetic operations such as multiplication and division can be performed on the values of t
Placebo
An innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication.
Blinding
Refers to nondisclosure of the treatment an experimental unit is receiving.
Double- blind
Experiment is one in which neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving.
Observational study
Measures the value of the response variable without attempting to influence the value of either the response of explanatory variables. That is, in an observational study, the researcher observes the behavior of the individuals in the study without trying
Designed
If a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable and then records the value of the response variable for each group
*used to assign causality; you can control the cause
Simple random sampling
A sample of size n from a population of size N is obtained through ___ if every possible sample of size n has an equally likely chance of occurring.
Stratified sample
Is obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be similar in some way.
Systematic sample
Is obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.
Cluster sample
Is obtained by selecting all individuals within a randomly selected collection or group of individuals
Convenience sample
Is a sample in which the individuals are easily obtained and not based on randomness
Sampling bias
Means that the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another
Nonresponse bias
Exists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do. Nonresponse can occur because individuals selected for the sample do not wish to respond or the interviewer was unable to cont
Bias
If the results of the sample are not representative of the population
Response bias
Exists when the answer on a survey do not reflect the true feelings of the respondent. Response bias can find its way into survey results in a number of ways.
� interviewer error
�misrepresented answers
�wording of question
�ordering of questions or words
Sampling error
Is the error that result from under-coverage, Nonresponse bias, or data-entry error. Such errors could also be present in a complete census of the population
Sampling error
Is the error that results from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.
Lower class limit of a class
Is the smallest value within the class,
Upper class limit of a class
Is the largest value within the class
Class width
Is the difference between consecutive lower class limits
Largest data value - smallest data value / number of classes
Uniform distribution
The frequency of each value of the variable is evenly spread out across the values of the variable.
Bell- shaped distribution
Highest frequency occurs in the middle and frequencies tail off to the left and right of the middle
Symmetric;
mean = median
Skewed right
The tail to the right of the peak is longer than the tail to the left of the peak
Mean substantially larger than median;
mean > median
Skewed left
The tail to the left of the peak is longer than the tail to the right of the peak
Mean substantially smaller than median;
mean < median
Mean
Is computed by determining the sum of all the values of the variable in the data set and dividing the number of the observations.
Population: "mew"
Sample: "x-bar
Median
Is the value that lies in the middle of the data when arranged in ascending order. We use M to represent the median
Mode
A variable is the most frequent observation of the variable that occurs in the data set
No mode
If no observation of the variable that occurs in the data set
Bimodal
Two modes
Multimodal
Three or more data values that occur within the highest frequency
A descriptive measure of a population
Parameter
Range of a variable
Is the difference between the largest data value and the smallest data value.
Sample variance
S^2, is computed by determining the sum of the squared deviations about sample mean and dividing this result by n-1
A statistic is said to be biased if it
Systematically underestimates or overestimates a parameter
Degrees of freedom
We call n-1 the ___ because the first n-1 observations have freedom to be whatever value they wish, but the nth value has no freedom
Sample standard deviation
S, is obtained by taking the square root of the sample variance. That is
S= square root of s^2
Measures of central tendency
Mean, median, mode
Measures of dispersion
Range, variance (sample and population), standard deviation (sample and population)
68%
Data lie between mean - 1 standard deviation and mean + standard deviation
95%
Data will lie between mean - 2 standard deviations and mean + 2 standard deviation
99.7%
Data will lie between mean - 3 standard deviations and mean + 3 standard deviations
Z-score
Represents the distance that a value is from the mean in terms of the number of standard deviations. It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation.
Percentile
The kth percentile of a set of data is value such that k percent of the observations are less than or equal to the value
Quartile
Quartiles divide data sets into fourths, or four equal parts
Five number summary
Consists of the smallest data value, the mean, and the largest data value.
*minimum, maximum, Q1, Q2, Q3.
IQR- inter quartile range
Is the range of the middle 50% of the observations in a data set. That is, the difference between the first and third Quartiles .
Outlier
Extreme observations
Lower fence:
Q1- 1.5(IQR)
Upper fence:
Q3 + 1.5(IQR)
*data less than lower fence or data greater than upper fence
The ___ variable is the variable whose value can be explained by the ____ variable
Response, predictor
Lurking variable
Is an explanatory variable that was not considered in a study, but that affects that value of the response variable in a study. Typically related to explanatory variables considered in the study
Negative correlation
The closer r=-1
Positive correlation
The closer r=1
A variable that is related to either the response variable or the predictor variable or both, but which is excluded from the analysis is a
Lurking variable
A scatter diagram locates a point in a two dimensional plane. The diagram locates the variable on the horizontal axis and the variable on the vertical axis.
Predictor; response
A residual is the difference between
The observed values of y and the predicted value of y.
The least squares regression line
Minimizes the sum of the residuals squared
For a given data set, the equation of the least squares regression line will always pass through
(x,y)
What does R^2 stand for? What does it measure?
The coefficient of determination - measures the proportion of total variation in the response variable that is explained by the least-squared regression line
Total deviation
The deviation between the observed value of the response variable, y, and the mean value of the response variable
Explained deviation
The deviations between the predicted value of the response variable and the mean value of the response variable
Unexplained deviation
The deviation between the observed value of the response variable and the predicted value of the response variable