Statistics
provide a way of understanding, illustrating, or otherwise making sense of quantitative data.
never prove anything; they only increase the confidence that a treatment resulted in an outcome.
Randomized control trials (RCTs)
are true experimental designs where the subjects are randomly assigned to control and treatment groups.
systematic reviews
are processes whereby published research from RCTs are pulled together on a specific topic using strict inclusion criteria, reviewed collectively, and presented in a meaningful way so the reader understands the topic in light of many studies viewed togeth
independent variable
subject of experiment or the change agent
dependent variable
characteristic of interest
Simple random sampling
is the strongest method because it randomly selects a sample from a larger group. This theoretically reduces the introduction of any human bias into this part of the process.
population
subject included or studied in experiment
convenience sampling
This approach uses a group for the simple reason of accessibility to the researcher.
Descriptive statistics
are test results that describe or characterize the data.
For example: the study consisted of 30% males and 70% females, the mean age was 24, and the average number of courses taken is five.
Inferential statistics
are used to imply something (or predict something) of a larger group based on the results from a sample.
For example: based on the results of the study with a sample of 24-year olds in the U.S., there is a statistically significant difference in GRE score
Nominal level data
is the lowest order. It is a naming level such as sex (male or female), race (African American, Caucasian, Hispanic, Pacific Islander, etc.), and blood type (A, B, AB, O).
Ordinal level data
is one step above nominal data. Ordinal level data is a ranking level, that is, the numbers indicate placing, but do not have a significant value otherwise.
You cannot perform mathematical functions on the numbers. Examples are placing in a contest (1st,
Interval level data
is one of the two higher order levels. the numbers have a mathematical value and the intervals between two numbers have value, does not include an absolute zero.
For example: temperature is an interval level measure. Ninety degrees to 50o is a range of te
Ratio level data
is the other higher order level. Ratio level data is the same as interval data except it has an absolute zero. Blood pressure is an example of ratio level of data. Here, a zero blood pressure means an absence of blood pressure. The dollar amount in a chec
presenting data
illustration of data
measures of central tendency
include the mean, median, and mode (descriptive measures of data)
mean
is the mathematical average
median
is the middle number in a set of numbers in ascending or descending order
mode
is the most frequently occurring number in a set of data
measures of dispersion (variability)
include the range, variance, and standard deviation
range
is the representation of how wide the distribution of scores is
width of range
is expressed as a value and is found by subtracting the low score from the high score.
variance
is the amount of spread of the data set
standard deviation
indicates how far, on average, a score is from the mean.
Probability
is the chance of something of interest occurring.
formula: P = number of nominated outcomes (outcomes of interest) / number of possible outcomes (all possible outcomes).
critical probability
the point at which the stated outcome is considered to be an unlikely the result of chance alone
It is usually represented as p < 0.01 and means the outcome of the experiment is expected to occur less than one time in 100 chance (p = 1/100). This would be
normal distribution curve
is a way of illustrating a common outcome of statistical tests
statistical outliers
Observations that are more than �3 standard deviations from the mean
Z scores
simply reflect the standard deviations.
z = (raw score - mean score) / standard deviation.
T scores
simply use increments of 10. The mean is given a T score of 50 and the standard deviations increase/decrease by 10. A T score of 60 is one standard deviation above the mean.
skew of a distribution
refers to how the curve leans. When a curve has extreme scores on the right hand side of the distribution, it is said to be positively skewed. When the tail of the curve is pulled downward by extreme low scores, it is said to be negatively skewed. T
Sampling error
is the error that results when using a sample mean to estimate a population characteristic.
sampling distribution.
The way each of these sample means cluster around the population mean
The Central Limit Theorem
states the means of a larger number of samples drawn randomly from the same population will be normally distributed
This theorem further states if one calculated the mean of those sample means, it would equal the mean of the population in question.
the standard error of the mean
If one were to calculate a standard deviation of these sample means
Confidence interval
indicates how accurate one believes the estimate to be.
the larger the sample size, the greater the confidence in the results.
directional hypothesis
(one-tailed hypothesis) stating some significant correlation
null hypothesis
(non-directional, two-tailed hypothesis). stating no relationship
A type I error is
the rejection of a true null hypothesis
A type II error
is the failure to reject a false null hypothesis
Degrees of freedom (df)
are based on the t-distribution. The t-distribution is a way of reflecting our confidence in a sample mean and standard deviation while accurately reflecting a population. This confidence is based on sample size: the smaller the sample, the less confident
chi-square tests
This type of test is common with nominal level data and is considered a lower order statistical test. Basically, it tells if there is or is not a statistically significant difference between the highest and lowest modes among the groups.
correlation
statistically significant relationship between two variables.
Spearman's rank order test
is used when one of the two variables is ordinal level data and the other is interval level data. For example, you would use a Spearman's if you were looking for a correlation between class rank (ordinal) and GPA (interval).
"rs" for Spearman.
The Pearson's product moment (product moment correlation coefficient)
is used when both variables are interval level data. For example, you would use a Pearson's to look for a correlation between GPA (interval) and SAT scores (interval).
"r" for Pearson
the coefficient of determination (r2)
What this is telling the reader is the percentage of one variable explained by the second variable. If, for example, you found a correlation of 0.7 between GPA and SAT (r = 0.7), the coefficient of determination would be 0.49 (r2 = 0.49). What this is say
regression analysis
basically evaluates how one set of data relates to another.
If you have ordinal level data and the two groups are unrelated (independent of each other), you would use a
Mann-Whitney U test.
If you have ordinal level data and the samples are related, you would use a
Kolmogorov-Smirnov test.
t-test.
If you have interval level data, you would use a
ANOVA
allows us to look at both the average amount of difference between groups (same as a t-test) as well as average amount of difference within each group. The additional advantage of an ANOVA is it can look at differences between more than two groups (t-test
Prevalence:
the proportion of the population that has a disease in question at a specific point in time.
Incidence:
the number of new cases identified during a particular time period.
Relative risk:
the ratio of the incidence rates among exposed to unexposed individuals in a population.
2x2 tables
are used to assess treatments with dichotomous outcomes (yes or no; did or did not; etc.).
Experimental event rate (EER):
a measure of how often a particular event (response or outcome) occurs within the experimental group during a study.
Control event rate (CER):
a measure of how often a particular event (response or outcome) occurs within the control group during a study.
Absolute risk reduction (ARR):
also known as attributable risk reduction; the difference in the risk of the outcome between patients who have undergone one therapy and those who have undergone another. Again, using the 2x2 table as an example, the formula for determining ARR is: [C/(C+
Relative risk reduction:
an estimate of the percentage of baseline risk that is removed as a result of the therapy and is calculated as the ARR between the treatment and control groups divided by the absolute risk among patients in the control group (see ARR) and the formula is:
Odds ratio:
simply the odds of an event occurring. The formula is:
(A/C)/(B/D).
Number Needed to Treat (NNT):
the number of patients who need to be treated to prevent one adverse event. It is the reciprocal of the ARR (1/ARR).
convenience sampling
This approach uses a group for the simple reason of accessibility to the researcher.