Statistics Test 1 | Elementary Statistics

Frequency Distribution

Lists each category of data and the number of occurrences of each category of data.

Relative frequency

Is the proportion (or percent) of observations within a category and is found using the formula
=frequency/sum of all frequencies

Relative frequency distribution

Lists each category of data together with the relative frequency

Pareto chart

Is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency

Pie chart

Is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category

Lower class limit (of a class)

Is the smallest value within the class

Upper class limit (of a class)

Is the largest value within the class.

Class width

Difference between consecutive lower class limits
= largest data value - smallest data value / number of classes

Histogram

Is constructed by drawing rectangles of each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.

Bar chart

Is constructed by labeling each category on the other axis. Rectangles of equal width are drawn for each category's frequency or relative frequency.

Statistics

Is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, it is about providing a measure of confidence in any conclusion.

Describe the relationship between population, sample, and individual.

The entire group of individuals to be studied is called the population. An individual is a person or object that is a member of the population being studied. A sample is a subset of the population that is being studied.

Descriptive statistics

Consists of organizing and summarizing data. Describes data through numerical summaries, tables, and graphs

Inferential statistics

Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result

Statistic

Numerical summary of a sample

Parameter

Sample of an entire population

Qualitative variables

Allows for classification of individuals based on some attribute or characteristic i.e.) gender

Quantitative variables

Provide numerical measures of individuals. Arithmetic operations such as addition and subtraction can be performed on the values of the quantitative variable and will provide meaningful results. i.e.) temperature

Discrete variable

Is a quantitative variable that had either a finite number of possible values or a countable number of possible values.

Continuous variable

Is a quantitative variable that has an infinite number of possible values that are not countable

Nominal level of measurement

If the values of the variable name, label, or categorize. In addition, the naming scheme allows for the values of the variable to be arranged in a ranked or specific order. i.e.) gender, it only allows for categorization of male or female. Plus, it is not

Ordinal level of measurement

Has properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order. i.e.) letter grade, values of the variable can be ranked, but differences in values have no meanin

Interval level of measurement

Has properties of the ordinal level of measurement and the differences in the values of the variable have meaning. A value of zero does not mean the absence of quantity. Arithmetic operations such as addition and subtraction can be preformed on values of

Ratio level of measurement

Has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero means the absence of quantity. Arithmetic operations such as multiplication and division can be performed on the values of t

Placebo

An innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication.

Blinding

Refers to nondisclosure of the treatment an experimental unit is receiving.

Double- blind

Experiment is one in which neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving.

Observational study

Measures the value of the response variable without attempting to influence the value of either the response of explanatory variables. That is, in an observational study, the researcher observes the behavior of the individuals in the study without trying

Designed

If a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable and then records the value of the response variable for each group
*used to assign causality; you can control the cause

Simple random sampling

A sample of size n from a population of size N is obtained through ___ if every possible sample of size n has an equally likely chance of occurring.

Stratified sample

Is obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be similar in some way.

Systematic sample

Is obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.

Cluster sample

Is obtained by selecting all individuals within a randomly selected collection or group of individuals

Convenience sample

Is a sample in which the individuals are easily obtained and not based on randomness

Sampling bias

Means that the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another

Nonresponse bias

Exists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do. Nonresponse can occur because individuals selected for the sample do not wish to respond or the interviewer was unable to cont

Bias

If the results of the sample are not representative of the population

Response bias

Exists when the answer on a survey do not reflect the true feelings of the respondent. Response bias can find its way into survey results in a number of ways.
� interviewer error
�misrepresented answers
�wording of question
�ordering of questions or words

Sampling error

Is the error that result from under-coverage, Nonresponse bias, or data-entry error. Such errors could also be present in a complete census of the population

Sampling error

Is the error that results from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.

Lower class limit of a class

Is the smallest value within the class,

Upper class limit of a class

Is the largest value within the class

Class width

Is the difference between consecutive lower class limits
Largest data value - smallest data value / number of classes

Uniform distribution

The frequency of each value of the variable is evenly spread out across the values of the variable.

Bell- shaped distribution

Highest frequency occurs in the middle and frequencies tail off to the left and right of the middle
Symmetric;
mean = median

Skewed right

The tail to the right of the peak is longer than the tail to the left of the peak
Mean substantially larger than median;
mean > median

Skewed left

The tail to the left of the peak is longer than the tail to the right of the peak
Mean substantially smaller than median;
mean < median

Mean

Is computed by determining the sum of all the values of the variable in the data set and dividing the number of the observations.
Population: "mew"
Sample: "x-bar

Median

Is the value that lies in the middle of the data when arranged in ascending order. We use M to represent the median

Mode

A variable is the most frequent observation of the variable that occurs in the data set

No mode

If no observation of the variable that occurs in the data set

Bimodal

Two modes

Multimodal

Three or more data values that occur within the highest frequency

A descriptive measure of a population

Parameter

Range of a variable

Is the difference between the largest data value and the smallest data value.

Sample variance

S^2, is computed by determining the sum of the squared deviations about sample mean and dividing this result by n-1

A statistic is said to be biased if it

Systematically underestimates or overestimates a parameter

Degrees of freedom

We call n-1 the ___ because the first n-1 observations have freedom to be whatever value they wish, but the nth value has no freedom

Sample standard deviation

S, is obtained by taking the square root of the sample variance. That is
S= square root of s^2

Measures of central tendency

Mean, median, mode

Measures of dispersion

Range, variance (sample and population), standard deviation (sample and population)

68%

Data lie between mean - 1 standard deviation and mean + standard deviation

95%

Data will lie between mean - 2 standard deviations and mean + 2 standard deviation

99.7%

Data will lie between mean - 3 standard deviations and mean + 3 standard deviations

Z-score

Represents the distance that a value is from the mean in terms of the number of standard deviations. It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation.

Percentile

The kth percentile of a set of data is value such that k percent of the observations are less than or equal to the value

Quartile

Quartiles divide data sets into fourths, or four equal parts

Five number summary

Consists of the smallest data value, the mean, and the largest data value.
*minimum, maximum, Q1, Q2, Q3.

IQR- inter quartile range

Is the range of the middle 50% of the observations in a data set. That is, the difference between the first and third Quartiles .

Outlier

Extreme observations
Lower fence:
Q1- 1.5(IQR)
Upper fence:
Q3 + 1.5(IQR)
*data less than lower fence or data greater than upper fence

The ___ variable is the variable whose value can be explained by the ____ variable

Response, predictor

Lurking variable

Is an explanatory variable that was not considered in a study, but that affects that value of the response variable in a study. Typically related to explanatory variables considered in the study

Negative correlation

The closer r=-1

Positive correlation

The closer r=1

A variable that is related to either the response variable or the predictor variable or both, but which is excluded from the analysis is a

Lurking variable

A scatter diagram locates a point in a two dimensional plane. The diagram locates the variable on the horizontal axis and the variable on the vertical axis.

Predictor; response

A residual is the difference between

The observed values of y and the predicted value of y.

The least squares regression line

Minimizes the sum of the residuals squared

For a given data set, the equation of the least squares regression line will always pass through

(x,y)

What does R^2 stand for? What does it measure?

The coefficient of determination - measures the proportion of total variation in the response variable that is explained by the least-squared regression line

Total deviation

The deviation between the observed value of the response variable, y, and the mean value of the response variable

Explained deviation

The deviations between the predicted value of the response variable and the mean value of the response variable

Unexplained deviation

The deviation between the observed value of the response variable and the predicted value of the response variable