Statistic
a numerical characteristic of a sample obtained by using the data values of from a sample
Parameter
a characteristic or measure obtained by using all the data values for a specific population; a statistic is used to estimate a parameter; a numerical characteristic of a population, distinct from a statistic of a sample
Sample
a group of subjects selected from a population to represent the population
Population
the totality of all subjects posessing certain common characteristics that are being studied
Statistical Inference
the process of deducing properties of an underlying distribution by analysis of data; inferential statistical analysis infers properties about a population: this includes testing hypotheses and deriving estimates
Variable
a characteristic or attribute that can assume different values when observed in different persons, places, or things
Continuous Variable
a variable that can assume all/any values between any two specific values; a variable obtained by measuring
Discrete Variable
a variable that assumes values that can be counted; the value can not be any value
Categorical Variable
a variable that can take on ONE of a limited, and usually fixed, number of possible values, thus assigning each individual to a particular group or "category
Quantitative Variable
a variable that is numerical in nature and that can be ordered or ranked
Qualitative Variable
a variable that can be placed into distinct categories, according to some characteristic or attribute
Observational Study
a study in which the researcher merely observes what is happening or what has happened in the past and draws conclusions based on these observations
Experimental Study
a study in which the researcher MANIPULATES one of the variable and tries to determine how the manipulation influences other variables
Simple Random Sampling
samples obtained by using random or chance methods; a sample for which every member of the population has an equal chance of being selected (may not be perfectly representative of the population) (probabilistic)
Systematic Random Sampling
samples obtained by numbering each element in the population and ten selecting every nth number from the population to be included in the sample
Stratified Random Sampling
samples obtained by dividing the populations into subgroups, called strata, according to various homogenous characteristics and then randomly selecting members from each stratum/group
Cluster Sampling
samples obtained by selecting a preexisting or natural group, called a cluster, and using the members in the cluster for the sample
Nonprobabalistic Sampling
nonrandom, cannot be used to infer from the sample to the general population, includes convenience/accidental sampling, snowball sampling, judgement sampling, deviant cases, case studies, and ad-hoc quotas
Convenience Sampling
choosing individuals who are easiest to reach
Judgement Sampling
samples in which the selection criteria are based on the researcher's personal judgment about the representativeness of the population under study; the researcher selects who should be in the study/who would be most apropriate for the study (nonprobabilistic)
Central Limit Theorem
as the sample size increases, the distribution of the sample mean of a randomly selected sample approaches the normal distribution; 95% of Xbar will +or- 1.96 Standard Error (SE) of population mean (mu); for when sample is larger than 30
Histogram
a graph that displays data by using vertical bars of various heights to represent the frequencies of a distribution; the columns are positioned over a label that represents a quantitative variable; the column label can be a single value or a range of values; the height of the column indicates the size of the group defined by the column label.
Pie Chart
a circle divided into sections according to the percentage of frequencies in each category of the distribution (% x 3.6 = degree of angle)
Bar Graph
a graph that also displays data by using vertical bars; the columns are positioned over a label that represents a categorical variable; the height of the column indicates the size of the group defined by the column label.
Boxplot
a graph used to represent a data set when the data set contains a small number of values; a boxplot splits the data set into quartiles. The body of the boxplot consists of a "box" which goes from the first quartile (Q1) to the third quartile (Q3); within the box, a vertical line is drawn at the Q2, the median of the data set. Two horizontal lines, called whiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier. Used to represent Exploratory Data Analysis (EDA) using the median.
Frequency Distribution
a table listing the possible values of the variableand their frequencies (counts of the number of times each value occurs)
Relative Frequency Distribution
tabular summary; a table listing the possible values of the variable along with their relative frequencies (proportions in fraction, percent, or ratio); shows the fraction of the total number of items in several classes
Tabular Summary
presented in rows and columns (Spreadsheet)
Mean
the sum of the values, divided by the total number of values; its magnitude can be affected by a single extremely large or small value (mu), nonresistant statistic-affected by outliers
Median
the midpoint of a data array (corresponds to the 50th percentile); resistant statistic-less affected by outliers
Mode
the value that occurs most often in a data set; the peak of a curve; meaningful when data is qualitative
Variance
the sum of squares of variables' distances from the mean divided by the number of variables minus 1.
Coefficient of Variation
CVar; a standardized measure of dispersion of a probability distribution or frequency distribution; the ratio of the standard deviation to the mean
Standard Deviation
the square root of the variance (sigma), nonresistant statistic-affected by outliers; as sample size decreases, SD of the sample mean increases
Bimodal Distribution
a distribution with two peaks/two modes
Positive Skewed/Right Skewed Distribution
non-symmetric distribution; the tail in the positive direction extends further than the tail in the negative direction; mean and median are larger than the mode
Negative Skewed/Left Skewed Distribution
non-symmetric distribution; the tail in the negative direction extends further than the tail in the positive direction; mean and median are smaller than the mode
Leptokurtic Distribution
the distribution has relatively more scores in its tails
Symmetric Distribution
the mean is the same as the median and mode
Mean Absolute Deviation
the mean of the distances of each (absolute) value from their mean
Finding Percentiles
multiply the percent and the total number of values for the index, round up the index and count the ordered values from left to right to find the percentile
Chebyshev's Theorem
for any distribution, at least 75% of the data values will fall within 2 standard deviations of the mean
Interquartile Range
difference between the first and third quartiles (Q3-Q1)
Outlier
any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile
Empirical Rule
for normal distribution/curve the approximate % of observations within 1 standard deviation (68%), 2 standard deviations (95%), and 3 standard deviations (99.7%) of the mean
Normal Distribution
a unimodal, symmetric, bell-shaped distribution of the data analysis for a selected variable; the total area under a normal curve is always equal to one; the curves never quite reach Y=0; curve extends to infinity and negative infinity; the mean = median = mode; 68% of variables within ±1 SD, 95% within ±2 SD, 99.7% within ±3 SD.
Standard Normal Curve
a normally distributed variable having mean 0 and standard deviation 1 is said to have the standard normal distribution; its associated normal curve is called the standard normal curve
Dispersion
variability, scatter, or spread
Linear Regression
relationship of a response or dependent variable (y), to a single independent feature or measurement variable (x) by a linear model.
Predictor Variable
The dependent variable in a correlational study that is used to predict the score on another variable
Independent/Predictor/Explanatory Variable
x; the variable that is varied or manipulated by the researcher; a variable whose values are independent of changes in the values of other variables; the variable that explains the response
Dependent/Response Variable
y; variables of interest in an experiment (those that are measured or observed)
Method of Least Squares
a statistical method to find a line that best fits a set of data; it is used to break out the fixed and variable components of a mixed cost; line that yields the smallest sum of squared residuals for all Y values
Residual
the difference between the prediction and the actual value/score
Correlation Coefficient
r; a measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation; positive will slope upward to the right; negative will slope downward to the right.
Proportion of the Total Variation
r squared; coefficient of determination; a number that indicates how well data fit a statistical model
Positive Relationship
when both variables (x & y) increase or decrease at the same time
Negative Relationship
when one variable increases and the other decreases
Coefficient of Nondetermination
1.00 - r squared; the percent of variation which is unexplained by the regression equation; the unexplained variation divided by the total variation
Probability
may range from 0.00 to 1.00; 1.00 being most likely to happen and 0.00 being unlikely to happen
Classical Probability
uses sample spaces to determine the numerical probability that an event will happen. It assumes all outcomes in the sample space are equally likely to occur.
Conditional Probability
the chance that a second event will happen, given tha the first event has already happened
Mutually Exclusive/Disjoint
a statistical term used to describe a situation where the occurrence of one event is not influenced or caused by another event; it is impossible for mutually exclusive events to occur at the same time
Independent Events
events for which the outcome of one event does not affect the probability of the other
Complement of an Event
set of all outcomes in the sample space that are not in the event
Confidence Interval
a range of values for a variable of interest; a specific interval estimate of a parameter determined by using data obtained by from a sample and by using the specific confidence level of the estimate; the specified probability is called the confidence level and the end points of the confidence interval are called the confidence limits; round to 3 decimal places
Standard Normal Distribution
a normal distribution with a mean of 0 and a standard deviation of 1
t Distribution
a distribution specified by degrees of freedom used to model test statistics for the one-sample t test, the two-sample t test, etc. where σ ('s) is (are) unknown. Also used to obtain a confidence interval for estimating a population mean, or the difference between two populations means, etc.
True Proportion
a proportion in which the ratios have been proven to be equal by multiplying the means and extremes
Null Hypothesis
states that there is no difference between a parameter and a specific value, or that there is no difference between two parameters; hypothesis that predicts NO relationship between variables; the aim of research is to reject this hypothesis; referred to as the "status quo" or a statement of "no effect or no difference
Type I Error
if a null hypothesis is true and it is rejected
p-value
the probability level which forms basis for deciding if results are statistically significant (not due to chance)
Rejection Region
area of a sampling distribution that corresponds to test statistic values that lead to rejection of the null hypothesis
Two Tailed Test
used when we predict that there is a relationship but do not predict the direction; used to test a nondirectional research hypothesis
Level of Significance
the maximum probability of committing a type I error; represented by alpha (α); when a null hypothesis is rejected, the probability of a type I error will be .10, .05, or .01 depending on which level of significance is used
Z-value/Z-score
the number of standard deviations a given observation is from the population mean
Confidence Level
the probability that the interval estimate will contain the parameter, assuming that a large number of samples are selected and that the estimation process on the same parameter is repeated
Estimator
a rule, method, or criterion for arriving at an estimate of the value of the parameter; a statistic based on sample observations that is used to estimate the numerical value of an unknown population parameter
Interval Estimate
an interval or range of values used to estimate the parameter
Interval Estimation
the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter, in contrast to point estimation, which is a single number
Degrees of Freedom (d.f.)
d.f. = n-1; number of values that are free to vary after a sample statistic has been computed
Least Squares Method
in linear regression, results in the values of the y-intercept and the slope which minimizes the sum of the squared deviations between the observed values of the dependent variable and the estimated values of the dependent variable
Point Estimate of Population Parameter
a single value of a statistic, ie. the sample mean (Xbar) is a point estimate of the population mean (μ); similarly, the sample porportion (p-hat) is a point estimate of the population proportion (P); interval estimate.
t-distribution
SMALL SAMPLE SIZE (n<30), s (standard deviation of sample); a distribution in which sigma is replaced with s; the t distribution is symmetric around zero and is bell-shaped, but the spread is greater because there is more variation
Sampling Error
is the difference between a sample statistic used to estimate a population parameter and the actulat value of the parameter
Scatter Plot
the visual representation of data in simple regression analysis
Simple Regression Analysis
the extent to which the relationship between two variables can be represented by a straight line, represented by a scatter plot
Statistical Hypothesis
a conjecture about a population parameter; may or may not be true
Alternative/Research Hypothesis
(H-subscript-1) is a statistical hypothesis that states the existence of a difference between a parameter and a specific value, or that there is a difference between two parameters
p-value
A measure of statistical significance. The lower, the more likely the results of an experiment did not occur simply chance.
Two-tailed Test
H0:μ = 82 and H1:μ doesn't= 82; H1 says the mean will be different either positive or negative
Right-tailed Test
H0:μ = 36 and H1:μ > 36; H1 says the mean will be greater than 36
Left-tailed Test
H0:μ = 78 and H1:μ < 78; H1 says the mean will be less than 78
Type II Error
if the null hypothesis is false but not rejected; the probability of a type II error is represented by betta (β)
Critical Value (C.V.)
seperates the critical region from the noncritical region
Critical/Rejection Region
the range of values of the test value that indicates that there is a significant difference and that the null hypothesis should be rejected
Noncritical/Nonrejection Region
the range of values of the test value that indicates that the difference was probably due to chance and that the null hypothesis SHOULD NOT be rejected
One-tailed Test
indicates that a null hypothesis should be rejected when the test value is in the critical region on one side of the mean; will be either right- or left-tailed test, depending on the direction of the inequality of the alternative hypothesis
Confidence Interval
When CI is 90%, Zα/2=1.65; 95%, Zα/2=1.96; 99%, Zα/2=2.58
Dependent Events
when the occurrence of the first affects the outcome or occurrence of the second event in such a way hat probability is changed
Discrete Probability Distribution
f(x)>=0 and the Sum of f(x)=1
Binomial Experiment
An experiment in which there are exactly two possible outcomes for each trial, a fixed number of INDEPENDENT trials, and the probabilities for each trial are the same.
Probability Density Function
a function used to compute probabilities for a continuous random variable. The area under the graph of a probability density function over an interval represents probability