Statistics | Statistics

Mean

This "measure of center" is the AVERAGE of the values in a data set.
(Mean is sensitive to extreme values.)

Median

A measure of center in a set of numerical data. The median of a list of values is the value appearing at the center of a sorted version of the list - or the mean of the two central values if the list contains an even number of values.
(Median is NOT sensi

Mode

The value that occurs most often in a set of data.

Skewness Affect - Right-Skewed Distribution

Mean > Median > Mode

Skewness Affect

A measure of the shape of an asymmetrical distribution.

Skewness Affect - Symmetric Unimodal Distribution

Mean = Median = Mode. Unimodal means it has ONE MODE. This is also an example of a NORMAL DISTRIBUTION.

Range

The difference between the largest value and smallest value of a data set. (Range = Largest Value - Smallest Value)
(A larger range is an indication of greater VARIABILITY, or greater spread, in the data set)

Deviation

The difference between a data value and the mean of the data set.
(The distance between the data value and the mean)
If data value x > mean, deviation will be positive.
If data value x < mean, deviation will be negative.
If data value x = mean, deviation

Population Variance ?�

The mean of the squared deviations in the population.

Population Standard Deviation ?

The positive square root of the population variance.

Sample Variance s�

Approximately the mean of the squared deviations in the sample.

Sample Standard Deviation s

The positive square root of the sample variance s�.

Standard Deviation

A common measure of the variability, or spread, of a data set. It is a typical deviation from the mean.

z-Score

Standardized scores calculated by subtracting the mean from an individual score and dividing the result by the standard deviation; represents the deviation from the mean in a normal distribution.

Outlier

An extremely large or extremely small data value relative to the rest of the data set.

Detecting Outliers - Z-score Method

Identify an outlier by determining is it is farther than 3 standard deviations from the mean, i.e., Z-score less than -3 or greater than 3.

Percentile

The location of a data value relative to other values in the data set, i.e., a score in the 90th percentile means that 90% of all scores are at or below the same level, and 10% scored higher than this score.

Percentile Calculation

i = (P/100)n. MORE TO THIS....

Percentile Rank

Percentage of scores falling at or below a specific score. A percentile rank of 95 means that 95% of all of the scores fall at or below this point. In other words, the score is as good as or better than 95% of the scores.

Quartiles

The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the dat

Interquartile Range (IQR)

A robust measure of variability, calculated as IQR=Q3-Q1. It is interpreted as the spread of the middle 50% of the data, and it is NOT affected to outliers since it ignores the highest 25% and the lowest 25% of the data set.

Five-Number Summary

An exploratory data analysis technique that uses five numbers to summarize the data: 1. smallest value, 2. first quartile, 3. median (second quartile), 4. third quartile, and 5. largest value.

Boxplot

A graphic display that represents the distribution of data by focusing on five key measures: Min, Q1, Q2, Q3, Max.

Boxplot Upper and Lower Fences

Upper Fence = Q1 - 1.5(IQR)
Lower Fence = Q3 + 1.5(IQR)

Detecting Outliers - IQR Method

A data value is an outlier is
a. it is located 1.5(IQR) or more below Q1, or
b. it is located 1.5(IQR) or more above Q3.

Chebyshev's Rule

The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/k^2, where k is any positive number greater that 1.

The Empirical Rule

This says that, in a normal bell-shaped curve, 68% of the data fall within one standard deviation, 95% within two, and 99.7% within three.

The Empirical Rule in terms of z-Scores

68% of the data will have z-scores between -1 and 1, 95% between -2 and 2, and 99.7% -3 and 3.

Scatterplot

A graphed cluster of dots, each of which represents the values of two variables. The slope of the points suggests the direction of the relationship between the two variables. The amount of scatter suggests the strength of the correlation (little scatter i

Scatterplot Variables x and y

x is horizontal axis, and y is vertical axis. x is the "predictor" variable, and y is the "response" variable.

Correlation Coefficient

A statistic, r, that summarizes the strength and direction of the linear relationship between two variables. It always takes on a value between -1 and 1, inclusive.

Comparison Test for Linear Correlation

1. Find the absolute value of the correlation coefficient r, |r|. |0.5|=0.5 and |-0.4|=0.4
2. Use the Table of Critical Values for the Correlation Coefficient and select the row corresponding to sample size n.
3. Compare the absolute value |r| from Step 1

empirical study

study based on the observation or experience

Absolute value:

The positive numeric value of a number (the minus sign in front of the number is disregarded).

Alternative hypothesis (Ha):

The hypothesis that states a statistically significant relationship exists between the variables. It is the hypothesis opposite to the null hypothesis. It is also referred to as the "acting" hypothesis or the research hypothesis.

Analysis of covariance (ANCOVA):

A combination of regression and analysis of variance techniques that allows comparison of group means after adjustment for the effect of the covariate.

Analysis of variance (ANOVA):

A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors.

Bar graph:

A graph used for nominal or ordinal data. A space separates the bars.

Bartlett's test:

A chi-square statistic used to test the significance of lambda.

Baseline:

Measures taken at the start of a study before any interventions; sometimes referred to as the pretest.

Bell shaped:

A graphical shape, typical of the normal distribution.

Box plots:

A graphic display that uses descriptive statistics based on percentiles

Continuous variable:

A variable that can take on any possible value within a range. For example, weight is a continuous variable because a weight of 152.5 lb makes sense. In contrast, number of children is a discrete variable because it can take on only certain values (0, 1,

Control group:

The group that is used for comparison in an experimental or quasi-experimental study

Population:

The entire group having some characteristic (eg, all people with depression, all residents of the United States). Often a sample is taken of the population and then the results are generalized to that population.

Sample:

A group selected from the population in the hope that the smaller group will be representative of the entire population.

Parameters:

Characteristics of the population.

Statistics:

The field of study that is concerned with obtaining, describing, and interpreting data; the characteristics of samples.

Variance:

A measure of the dispersion of scores around the mean. It is equal to the standard deviation squared.

Null hypothesis:

The hypothesis that states that two or more variables being compared will not be related to each other (ie, no significant relationship between the variables will be found).

Normal distribution:

A theoretical probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability that these values will occur. Normal distributions are unimodal
(mean, median, and mode are t

Data set:

Collection of different values of all the variables used to measure the characteristics of the sample or population

Independent variable

The variable that is seen as having an effect on the dependent variable. In experimental designs, the treatment is manipulated.

Dependent variable:

The variable that measures the effect of some other variable (eg, the variable whose values are expected to be predicted by the independent variable). Also referred to as the outcome variable or the response variable.

Histogram:

A way of graphically displaying ordinal-, interval-, and ratio-level data. It shows the shape of the distribution.

Ordinal scale:

A measurement scale that ranks participants on some variable. The interval between the ranks does not necessarily have to be equal. Examples of ordinal variables are scale items that measure any subjective state (eg, happiness: very happy, somewhat happy,

Nominal:

The lowest level of measurement; consists of organizing data into discrete units.

Nominal measure:

A measurement scale in which the numbers have no intrinsic meaning but are merely used to label different categories. Ethnic identity, religion, and health insurance status (eg, none, Medicaid, Medicare, private) are all examples of nominal-level data.

Ratio-level measurement:

The highest level of measurement. In addition to equal intervals between data points, there is an absolute zero.

Ratio scale:

A measurement scale in which there are both equal intervals between units and a true zero. Most biologic measures (eg, weight, pulse rate) are ratio-level variables.

Interval-level measurement:

A rank-order scale with equal intervals between units but no true zero. IQ scores, SAT scores, and GRE scores are all examples of interval-level data.

Interquartile range:

The range of values extending from the 25th to the 50th percentile.

Quartile:

The four "quarters" of the data distribution. The first quartile is the 25th percentile, the second quartile is the 50th percentile, the third quartile is the 75th percentile, and the fourth quartile is the 100th percentile

Zero-order correlation

The measured relationship between two variables

Variable:

A measured characteristic that can take on different values

Type I error:

Rejecting the null hypothesis when it is true.

Type II error:

Accepting the null hypothesis when it is false.

Standard scores:

z-scores; represent the deviation of scores around the mean in a distribution with a mean of "0" and a standard deviation of "1.

Central limit theorem:

When many samples are drawn from a population, the means of these samples tend to be normally distributed.