Ch. 12

Statistical Analysis

Descriptive statistics: used to describe and synthesize data
Inferential statistics: used to make inferences about the population based on sample data

Descriptive Indexes

Parameter: a characteristic of a population (e.g., the average age of EMTs in the US)
Statistic: an estimate of a parameter, calculated from a sample (e.g., the average age of EMTs in Jefferson County, KY)

Descriptive Statistics: Frequency Distributions

A systematic arrangement of numeric values on a variable from lowest to highest, and a count of the number of times (and/or percentage) each value was obtained
Can be presented in a table (Ns and percentages) or graphically (e.g., frequency polygons) p. 2

Frequency distributions can be described in terms of

Shape
Central tendency
Variability

Shapes of Distributions

Symmetric
Skewed (asymmetric)
Positive skew (long tail points to the right)
Negative skew (long tail points to the left)
Peakedness (how sharp the peak is)
Modality (number of peaks)
Unimodal (1 peak)
Bimodal (2 peaks)
Multimodal (2+ peaks)

Normal Distribution

Characteristics:
Symmetric
Unimodal
Not too peaked, not too flat
More popularly referred to as a bell-shaped curve
Important distribution in inferential statistics

Central Tendency

Index of "typicalness" of a set of scores that comes from center of the distribution
Mode
Median
Mean

Mode

the most frequently occurring score in a distribution
Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mode = 3
useful mainly as gross descriptor, especially of nominal measures

Median

the point in a distribution above which and below which 50% of cases fall
Ex: 2, 3, 3, 3, 4 | 5, 6, 7, 8, 9 Median = 4.5
useful mainly as descriptor of typical value when distribution is skewed (e.g., household income)

Mean

equals the sum of all scores divided by the total number of scores
Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mean = 5.0
most stable and widely used indicator of central tendency

Variability

The degree to which scores in a distribution are spread out or dispersed
Homogeneity�little variability
Heterogeneity�great variability

Indexes of Variability

Range: highest value minus lowest value
Standard deviation (SD): average deviation of scores in a distribution

Bivariate Descriptive Statistics

Used for describing the relationship between two variables
Two common approaches:
Contingency tables (Crosstabs)
Correlation coefficients

Contingency Table

A two-dimensional frequency distribution; frequencies of two variables are cross-tabulated
"Cells" at intersection of rows and columns display counts and percentages
Variables usually nominal or ordinal

Correlation Coefficients

Indicate direction and magnitude of relationship between two variables
The most widely used correlation coefficient is Pearson's r.
Correlation coefficients can range from
-1.00 to +1.00
The greater the absolute value of the coefficient, the stronger the

Pearson's r

used when both variables are interval- or ratio-level measures.

Negative relationship (0.00 to -1.00)

one variable increases in value as the other decreases, e.g., amount of exercise and weight

Positive relationship (0.00 to +1.00)

both variables increase, e.g., calorie consumption and weight

Describing Risk

Clinical decision-making for EBP may involve the calculation of risk indexes, so that decisions can be made about relative risks for alternative treatments or exposures.
Some frequently used indexes:
Absolute Risk
Absolute Risk Reduction (ARR)
Odds Ratio

Absolute Risk (AR)

Risk for those exposed and not exposed
The proportion experienced an adverse outcome in each group
What is the AR of developing DM an aggressive exercise group versus the couch potato group? After 24 months, the AR in the aggressive exercise group is .20

The Odds Ratio (OR)

The odds = the proportion of people with an adverse outcome relative to those without it
The odds ratio is computed to compare the odds of an adverse outcome for two groups being compared (e.g., men vs. women)
E.g., smokers v. non-smokers and lung ca OR 3

Inferential Statistics

Used to make objective decisions about population parameters using sample data
Based on laws of probability
Uses the concept of theoretical distributions
e.g., the sampling distribution of the mean

Sampling Distribution of the Mean

A theoretical distribution of means for an infinite number of samples drawn from the same population
Is always normally distributed
Its mean equals the population mean.
Its standard deviation is called the standard error of the mean (SEM).
SEM is estimate

Statistical Inference�Two Forms

Parameter estimation
Hypothesis testing (more common among nurse researchers than among medical researchers)

Estimation of Parameters

Point estimation�A single descriptive statistic that estimates the population value (e.g., a mean, percentage, or OR)
Interval estimation�A range of values within which a population value probably lies; Involves computing a confidence interval (CI)

Confidence Intervals

CIs indicate the upper and lower confidence limits and the probability that the population value is between those limits.
For example, a 95% CI of 40-50 for a sample mean of 45 indicates there is a 95% probability that the population mean is between 40 an

Hypothesis Testing

Based on rules of negative inference: research hypotheses are supported if null hypotheses can be rejected.
Involves statistical decision-making to either:accept/reject the null hypothesis
Researchers compute a test statistic with their data and then dete

Parametric Statistics

Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale

Nonparametric Statistics

Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn't assume normal distribution in the population

Overview of Hypothesis-Testing Procedures

Select an appropriate test statistic.
Establish significance criterion (e.g., alpha = .05).
Compute test statistic with actual data.
Calculate degrees of freedom (df) for the test statistic.
Obtain a critical value for the statistical test (e.g., from a t

Commonly Used Bivariate Statistical Tests

t-Test
Analysis of variance (ANOVA)
Pearson's r
Chi-squared test

t -Test

Tests the difference between two means
t-test for INDEPENDENT groups: between-subjects test
e.g., means for men vs. women
t-test for DEPENDENT (paired) groups: within-subjects test
e.g., means for patients before and after surgery

Analysis of Variance, F-test (ANOVA)

Tests the difference between more than 2 means
One-way ANOVA (e.g., 3 groups)
Multifactor (e.g., two-way) ANOVA
Repeated measures ANOVA (RM-ANOVA): within subjects

Chi-Squared Test (?�)

Tests the difference in proportions in categories within a contingency table
Compares observed frequencies in each cell with expected frequencies�the frequencies expected if there was no relationship

Correlation r

Pearson's r is both a descriptive and an inferential statistic.
Tests that the relationship between two variables is not zero.

Effect Size

Effect size is an important concept in power analysis.
Effect size indexes summarize the magnitude of the effect of the independent variable on the dependent variable.
In a comparison of two group means (i.e., in a t-test situation), the effect size index

Multivariate Statistics

Statistical procedures for analyzing relationships among 3 or more variables
Two commonly used procedures in nursing research:
Multiple regression
Analysis of covariance (ANCOVA)

Multiple Linear Regression

Used to predict a dependent variable based on two or more independent (predictor) variables
Dependent variable is continuous (interval or ratio-level data).
Predictor variables are continuous (interval or ratio) or dichotomous.

Multiple Correlation Coefficient (R)

The correlation index for a dependent variable and 2+ independent (predictor) variables: R
Does not have negative values: shows strength of relationships, not direction
R2 is an estimate of the proportion of variability in the dependent variable accounted

Analysis of Covariance (ANCOVA)

Extends ANOVA by removing the effect of confounding variables (covariates) before testing whether mean group differences are statistically significant
Levels of measurement of variables:
Dependent variable is continuous(ratio or interval level)
Independen

Logistic Regression

Analyzes relationships between a nominal-level dependent variable and 2+ independent variables
Yields an odds ratio�the risk of an outcome occurring given one condition, versus the risk of it occurring given a different condition
The OR is calculated afte

Factor Analysis

Used to reduce a large set of variables into a smaller set of underlying dimensions (factors)
Used primarily in developing scales and complex instruments

Multivariate Analysis of Variance

The extension of ANOVA to more than one dependent variable
Abbreviated as MANOVA
Can be used with covariates: Multivariate analysis of covariance (MANCOVA)

Causal Modeling

Tests a hypothesized multivariable causal explanation of a phenomenon
Includes:
Path analysis
Structural equations modeling

Reading About Stats

Find stats under Results in article
Heavy reliance on tables
Greater than p >.05, NS
Less than p = or <.05, significant
Think about statistically versus clinical significance
Findings can be both
Neither, clinically but not stat, stat but not clinically s

The researcher subtracts the lowest value of data from the highest value of data to obtain which of the following?
Mode
Median
Mean
Range

Range

Which test would be used to compare the observed frequencies with expected frequencies within a contingency table?
Pearson's r
Chi-squared test
t-Test
ANOVA

Chi-squared test

T/F A correlation coefficient of -.38 is stronger than a correlation coefficient of +.32

True

T/F A bell-shaped curve is also called a normal distribution

True

T/F Parametric statistical testing usually involves measurements on the nominal scale

False

Clinical, statistical significant?
Tx A, days of hospitalization = 6, Tx B days = 14, p = .04
Tx A = 4.2, Tx B = 4.4, p = .057
Tx A = 7, Tx B = 9.7, p = .07

p = .04

The absent-minded statistician waded into a river with a mean depth of 3 feet and almost drowned.
What did the statistician forget?
a. the median
b. the correlation coefficient
c. the standard deviation
d. the null hypothesis

c. the standard deviation
I THINK