Elementary Statistics Ch 1-3 for Test 1

Statistics

the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.

Variable

a characteristic or attribute that can assume different values.

Data

the values (measurements or observations) that the variables can assume. Information.

Random Variables

variables whose values are determined by chance.

Data Set

a collection of data values.

Data Value (Datum)

each value in the data set.

Descriptive Statistics

consists of the collection, organization, summarization, and presentation of data. Describing a situation.

Inferential Statistics

consists of generalizing from samples to populations, performing estimations & hypothesis tests, determining relationships among variables, & making predictions.

Probability

the chance of an event occurring.

Population

consists of all subjects (human or otherwise) that are being studied.

Sample

a group of subjects selected from a population.

Hypothesis Testing

a decision making process for evaluating claims about a population, based on information obtained from samples.

Placebo

substance with no medical benefit or harm.

Qualitative Variable

Variables that can be placed into distinct categories, according to some characteristic or attribute. (i.e. gender- male/female)(i.e. religion- catholic, Muslim, Hindu, Mormon).

Quantitative Variable

Numerical and can be ordered by rank. (i.e. age or height). Can be classified into 2 groups: Discrete & Continuous.

Discrete Variable

quantitative variables that assume values that can be counted.

Continuous Variable

quantitative variable that can assume an infinite # of values between any 2 specific values. They are obtained by measuring. Often include fractions or decimals.

Nominal Level of Measurement

classifies data into mutually exclusive (non-overlapping), exhausting categories in which no order or ranking can be imposed on the data. (i.e. political parties).

Ordinal Level of Measurement

classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. (i.e. A,B,C,D... or 1st, 2nd, & 3rd place).

Interval Level of Measurement

ranks of data & precise differences between units of measurement do exist; however, there is no meaningful zero. (i.e. temperature- 0 degrees Fahrenheit).

Ratio Level of Measurement

possesses all the characteristics of interval measurement, and there exists a true zero. In addition, true ratios exist when the same variable is measured on 2 different members of the population.

Random Sample

selected by using chance methods or random numbers.

Systematic Sample

obtained by numbering each subject of the population and then selecting every Kth subject.

Stratified Sample

obtained by dividing the population into groups (called strata) according to some characteristic that is important to the study, then sampling from each group.

Cluster Sample

the population is divided into groups called clusters by some means. Some of the clusters are selected, and all of the members of the cluster are used.

Observational Study

the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations.

Experimental Study

the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables.

Quasi-Experimental Study

an experimental study that uses already intact groups.

Independent Variable

a variable that is being manipulated by the researcher. Also called the Explanatory Variable.

Dependent Variable

the variable that is studied to see of it has changed significantly due to the manipulation of the independent variable. Also called the Outcome Variable.

Treatment Group

a group, in a study, that receives special instructions, or some type of special treatment.

Control Group

a group in an experimental study that is not given any specific instructions or special treatment.

Hawthorne Effect

subjects, who knowingly participate in an experimental study, that change their behavior in ways that affect the result of a study.

Confounding Variable

a variable that influences the outcome variable, but was not separated from the independent variable.

Detached Statistic

a claim in which no comparison is made.

Implied Connection

a claim that attempts to imply a connection between variables that may not actually exist.

Raw Data

data in it's original form.

Frequency Distribution

the organization of raw data in table form, using classes and frequencies.

Class

raw data that is placed into a quantitative or qualitative category.

Frequency

the number of data values contained in a specific class.

Categorical Frequency Distribution

used for data that can be placed into specific categories, such as nominal or ordinal level data.

Grouped Frequency Distribution

used when the range of the data is large, and must be grouped into classes that are more than one unit in width.

Lower Class Limit

represents the smallest data value that can be included in a class.

Upper Class Limit

represents the largest data value that can be included in a class.

Class Boundary

used to separate classes so that there are no gaps in the frequency distribution.

Class Width

this is found, for a class frequency distribution, by subtracting the lower/upper class limit of one class from the lower/upper class limit of the next class.

Class Limit "Recommendations

A) classes must be equal in width. B) There should be between 5 & 20 classes. C) Preferably an odd quantity of classes (this makes finding the class median easier). D) classes must be mutually exclusive. E) classes must be continuous. F) classes must be e

Cumulative Frequency Distribution

a distribution that shows the number of data values less than or equal to a specific value, usually an upper boundary.

Ungrouped Frequency Distribution

a frequency distribution that can be constructed using single data values for each class. This is used when the range of data values are relatively small.

Histogram

a graph that displays the data by using contiguous vertical bars of various heights to represent the frequencies of the class.

Frequency Polygon

a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. The heights of the points determines the frequencies.

Cumulative Frequencies

the sum of the frequencies accumulated on the upper boundary of a class in the distribution.

Ogive

a graph that represents the cumulative frequencies for the classes in a frequency distribution.

Relative Frequency Graph

a graph using proportions instead of raw data as frequencies.

Bar Graph

represents the data by using vertical or horizontal bars whose heights or lengths represent the frequency of the data.

Pareto Chart

used to represent a frequency distribution for a categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged on order from highest to lowest.

Time Series Graph

represents data that occur over a specific period of time.

Pie Graph

a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution.

Stem & Leaf Plots

a data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes.

a Statistic

a characteristic or measure obtained by using the data values from a sample.

a Parameter

a characteristic or measure obtained by using all the data values from a specific population.

Mean

Also known as the arithmetic average, the mean is the sum of the values divided by the total number of values.

Median

the midpoint of the data array. Before you can find this point, the data must be arranged in numerical order from lowest to highest.

Mode

the value that occurs most often in the data value set.

Bimodal

data values consisting of 2 modes.

Multimodal

data values consisting of 3 or more modes.

Modal Class

mode for grouped data. The class with the largest frequency.

Outliers

an extremely high or extremely low data value in the data set.

Identifying Outliers

1) arrange the data in order, and find Q1 & Q3. 2) Find the IQR. 3) Multiply the IQR by 1.5. 4) Subtract that value from Q1 & add that value to Q3. 5) Check the data set for any data value that is smaller than Q1-1.5(IQR), or larger than Q3+1.5(IQR).

Midrange

a rough estimate of the middle. found by adding the lowest & the highest data values in the data set, and dividing by 2.

Weighted Mean

used when the values are not all equally represented. This is found by multiplying each value by its corresponding weight & dividing the sum of the products by the sum of the weights.

Range

the highest value minus the lowest value.

Variance

the average of the squares of the distance each value is from the mean.

Standard Deviation

the square root of the variance.

Coefficient of Variation

the standard deviation divided by the mean.

The Empirical Rule

when applied to a bell shaped distribution A) 68% of the data will fall within 1 standard deviation of the mean. B) 95% of the data will fall within 2 standard deviations of the mean. C) 99.7% of the data will fall within 3 standard deviations of the mean

Standard or Z Score

a score for a value obtained by subtracting the mean from the value and dividing the result by the standard deviation. If Z = 0, then the data value = the mean. Z = value - mean / standard deviation.

Percentiles

divide the data set into 100 equal groups.

Quartiles

found by dividing the distribution into 4 groups, separated by Q1, Q2, & Q3. Can be used as a rough estimate of variability.

Inner Quartile Range

defined as the difference between Q1 & Q3, and is the range of the middle 50% of the data.

Deciles

Found by dividing the distribution into 10 groups.

Exploratory Data Analysis

In EDA, data can be organized using a stem & leaf plot. The act of analyzing data to determine what information can be obtained by using stem & leaf plots, medians, IQRs, & boxplots.