Statistical Analysis
Descriptive statistics: used to describe and synthesize data
Inferential statistics: used to make inferences about the population based on sample data
Descriptive Indexes
Parameter: a characteristic of a population (e.g., the average age of EMTs in the US)
Statistic: an estimate of a parameter, calculated from a sample (e.g., the average age of EMTs in Jefferson County, KY)
Descriptive Statistics: Frequency Distributions
A systematic arrangement of numeric values on a variable from lowest to highest, and a count of the number of times (and/or percentage) each value was obtained
Can be presented in a table (Ns and percentages) or graphically (e.g., frequency polygons) p. 2
Frequency distributions can be described in terms of
Shape
Central tendency
Variability
Shapes of Distributions
Symmetric
Skewed (asymmetric)
Positive skew (long tail points to the right)
Negative skew (long tail points to the left)
Peakedness (how sharp the peak is)
Modality (number of peaks)
Unimodal (1 peak)
Bimodal (2 peaks)
Multimodal (2+ peaks)
Normal Distribution
Characteristics:
Symmetric
Unimodal
Not too peaked, not too flat
More popularly referred to as a bell-shaped curve
Important distribution in inferential statistics
Central Tendency
Index of "typicalness" of a set of scores that comes from center of the distribution
Mode
Median
Mean
Mode
the most frequently occurring score in a distribution
Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mode = 3
useful mainly as gross descriptor, especially of nominal measures
Median
the point in a distribution above which and below which 50% of cases fall
Ex: 2, 3, 3, 3, 4 | 5, 6, 7, 8, 9 Median = 4.5
useful mainly as descriptor of typical value when distribution is skewed (e.g., household income)
Mean
equals the sum of all scores divided by the total number of scores
Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mean = 5.0
most stable and widely used indicator of central tendency
Variability
The degree to which scores in a distribution are spread out or dispersed
Homogeneity�little variability
Heterogeneity�great variability
Indexes of Variability
Range: highest value minus lowest value
Standard deviation (SD): average deviation of scores in a distribution
Bivariate Descriptive Statistics
Used for describing the relationship between two variables
Two common approaches:
Contingency tables (Crosstabs)
Correlation coefficients
Contingency Table
A two-dimensional frequency distribution; frequencies of two variables are cross-tabulated
"Cells" at intersection of rows and columns display counts and percentages
Variables usually nominal or ordinal
Correlation Coefficients
Indicate direction and magnitude of relationship between two variables
The most widely used correlation coefficient is Pearson's r.
Correlation coefficients can range from
-1.00 to +1.00
The greater the absolute value of the coefficient, the stronger the
Pearson's r
used when both variables are interval- or ratio-level measures.
Negative relationship (0.00 to -1.00)
one variable increases in value as the other decreases, e.g., amount of exercise and weight
Positive relationship (0.00 to +1.00)
both variables increase, e.g., calorie consumption and weight
Describing Risk
Clinical decision-making for EBP may involve the calculation of risk indexes, so that decisions can be made about relative risks for alternative treatments or exposures.
Some frequently used indexes:
Absolute Risk
Absolute Risk Reduction (ARR)
Odds Ratio
Absolute Risk (AR)
Risk for those exposed and not exposed
The proportion experienced an adverse outcome in each group
What is the AR of developing DM an aggressive exercise group versus the couch potato group?After 24 months, the AR in the aggressive exercise group is .20
The Odds Ratio (OR)
The odds = the proportion of people with an adverse outcome relative to those without it
The odds ratio is computed to compare the odds of an adverse outcome for two groups being compared (e.g., men vs. women)
E.g., smokers v. non-smokers and lung ca OR 3
Inferential Statistics
Used to make objective decisions about population parameters using sample data
Based on laws of probability
Uses the concept of theoretical distributions
e.g., the sampling distribution of the mean
Sampling Distribution of the Mean
A theoretical distribution of means for an infinite number of samples drawn from the same population
Is always normally distributed
Its mean equals the population mean.
Its standard deviation is called the standard error of the mean (SEM).
SEM is estimate
Statistical Inference�Two Forms
Parameter estimation
Hypothesis testing (more common among nurse researchers than among medical researchers)
Estimation of Parameters
Point estimation�A single descriptive statistic that estimates the population value (e.g., a mean, percentage, or OR)
Interval estimation�A range of values within which a population value probably lies; Involves computing a confidence interval (CI)
Confidence Intervals
CIs indicate the upper and lower confidence limits and the probability that the population value is between those limits.
For example, a 95% CI of 40-50 for a sample mean of 45 indicates there is a 95% probability that the population mean is between 40 an
Hypothesis Testing
Based on rules of negative inference: research hypotheses are supported if null hypotheses can be rejected.
Involves statistical decision-making to either:accept/reject the null hypothesis
Researchers compute a test statistic with their data and then dete
Parametric Statistics
Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale
Nonparametric Statistics
Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn't assume normal distribution in the population
Overview of Hypothesis-Testing Procedures
Select an appropriate test statistic.
Establish significance criterion (e.g., alpha = .05).
Compute test statistic with actual data.
Calculate degrees of freedom (df) for the test statistic.
Obtain a critical value for the statistical test (e.g., from a t
Commonly Used Bivariate Statistical Tests
t-Test
Analysis of variance (ANOVA)
Pearson's r
Chi-squared test
t -Test
Tests the difference between two means
t-test for INDEPENDENT groups: between-subjects test
e.g., means for men vs. women
t-test for DEPENDENT (paired) groups: within-subjects test
e.g., means for patients before and after surgery
Analysis of Variance, F-test (ANOVA)
Tests the difference between more than 2 means
One-way ANOVA (e.g., 3 groups)
Multifactor (e.g., two-way) ANOVA
Repeated measures ANOVA (RM-ANOVA): within subjects
Chi-Squared Test (?�)
Tests the difference in proportions in categories within a contingency table
Compares observed frequencies in each cell with expected frequencies�the frequencies expected if there was no relationship
Correlation r
Pearson's r is both a descriptive and an inferential statistic.
Tests that the relationship between two variables is not zero.
Effect Size
Effect size is an important concept in power analysis.
Effect size indexes summarize the magnitude of the effect of the independent variable on the dependent variable.
In a comparison of two group means (i.e., in a t-test situation), the effect size index
Multivariate Statistics
Statistical procedures for analyzing relationships among 3 or more variables
Two commonly used procedures in nursing research:
Multiple regression
Analysis of covariance (ANCOVA)
Multiple Linear Regression
Used to predict a dependent variable based on two or more independent (predictor) variables
Dependent variable is continuous (interval or ratio-level data).
Predictor variables are continuous (interval or ratio) or dichotomous.
Multiple Correlation Coefficient (R)
The correlation index for a dependent variable and 2+ independent (predictor) variables: R
Does not have negative values: shows strength of relationships, not direction
R2 is an estimate of the proportion of variability in the dependent variable accounted
Analysis of Covariance (ANCOVA)
Extends ANOVA by removing the effect of confounding variables (covariates) before testing whether mean group differences are statistically significant
Levels of measurement of variables:
Dependent variable is continuous(ratio or interval level)
Independen
Logistic Regression
Analyzes relationships between a nominal-level dependent variable and 2+ independent variables
Yields an odds ratio�the risk of an outcome occurring given one condition, versus the risk of it occurring given a different condition
The OR is calculated afte
Factor Analysis
Used to reduce a large set of variables into a smaller set of underlying dimensions (factors)
Used primarily in developing scales and complex instruments
Multivariate Analysis of Variance
The extension of ANOVA to more than one dependent variable
Abbreviated as MANOVA
Can be used with covariates: Multivariate analysis of covariance (MANCOVA)
Causal Modeling
Tests a hypothesized multivariable causal explanation of a phenomenon
Includes:
Path analysis
Structural equations modeling
Reading About Stats
Find stats under Results in article
Heavy reliance on tables
Greater than p >.05, NS
Less than p = or <.05, significant
Think about statistically versus clinical significance
Findings can be both
Neither, clinically but not stat, stat but not clinically s
The researcher subtracts the lowest value of data from the highest value of data to obtain which of the following?
Mode
Median
Mean
Range
Range
Which test would be used to compare the observed frequencies with expected frequencies within a contingency table?
Pearson's r
Chi-squared test
t-Test
ANOVA
Chi-squared test
T/F A correlation coefficient of -.38 is stronger than a correlation coefficient of +.32
True
T/F A bell-shaped curve is also called a normal distribution
True
T/F Parametric statistical testing usually involves measurements on the nominal scale
False
Clinical, statistical significant?
Tx A, days of hospitalization = 6, Tx B days = 14, p = .04
Tx A = 4.2, Tx B = 4.4, p = .057
Tx A = 7, Tx B = 9.7, p = .07
p = .04
The absent-minded statistician waded into a river with a mean depth of 3 feet and almost drowned.
What did the statistician forget?
a. the median
b. the correlation coefficient
c. the standard deviation
d. the null hypothesis
c. the standard deviation
I THINK