Research & Outcome Measures


Discrete Variable

Described in whole numbers (ex: HR, # of steps)

Continuous Variable

Any value along a range (ex: goniometer, ruler, stop watch

Systematic Error

Predictable, constant errors, do NOT cancel out over time (ex: over- or under-estimating)

Scales of Measurements

1. Nominal Scale
2. Ordinal Scale
3. Interval Scale
4. Ratio Scale

Nominal Scale

Mutually exclusive; dichotomous; info regarding frequency of occurrence; no person can be assigned to more than one category (ex: gender, yes/no)
a. Non-parametic
b. Test= chi squared

Ordinal Scale

Rank order based on operationally defined characteristics.
no equidistance between markers.
> or <
info that describes a characteristic
ex: Oswestry, min, mod, severe, MMT scores
a. Non-parametric
b. Test= spearman rho

Interval Scale

Rank order where intervals between units are equidistant, but no true zero exists
ex: temperature, calendar year
a. Parametric
b. Test = Pearson correlation coefficient

Ratio Scale

BEST scale
Intervals between units are equidistant and there is an absolute true zero
ex: ROM, Height, weight, force
a. Parametric
b. Test = ANOVA

Tests that look for a significant difference with a p value of < 0.005.

Parametric Tests
1. T-Test
Non-Parametric Tests
1. Mann-Whitney U
2. Kruskal-Wallis ANOVA
3. Chi Square


2 Groups and the question is regarding if the groups differ on the dependent variable
a. Independent t-test: different subjects in each group
b. Paired t-test: subjects compared with themselves


2 or more groups with a NORMAL distribution


Compare groups on dependent variable when groups vary on a relevant characteristic before treatment (called a covariate)

Mann-Whitney U

A. Analogue to Independent t-test
B. For 2 samples

Kruskal-Willis ANOVA

For 3 or more samples

Chi Square

Measures goodness of fit
Measures proportions or frequencies with categories

A Priori Test

The likelihood of an event occurring when there is a finite amount of outcomes and each is likely to occur.
Probability derived purely by deductive reasoning.
Principle indifference: if there are N mutually exclusive and collectively exhaustive events and

Post-hoc tests

Unplanned comparisons

Choosing the correct statistical test.

Depends on the question being asked:
1. For association between variables: correlation coefficient
2. Questions about prediction: regression analysis
3. Treatment effect questions: chi-square, ANOVA, or t-test


Consistency of repeatability of measurements, the degree to which measurements are free from error, and the degree to which repeated measurements will agree.
Reliability does NOT mean a test is valid.
(Reliability studies have shown a measurement varies u

acceptable reliability (reliability coefficients)

1 = measurement has no error
> 0.90 is desired
>.75 = good
.50-.75 = moderate
<.50 = poor
0 = all measurement variability is error

internal consistency

the extent to which items that contribute to a measurement reflect one basic phenomenon
EX; pain, balance

interrater/tester reliability

the consistency of a measurements when more than one rater takes the measurements, indicates agreement of measurement taken by different examiners

intrarater/tester reliability

the consistency of measurement when one rater takes repeated measurements in time
indicates the agreement of measurement taken over time

parallel-forms (alternate-forms) reliability

the consistency of agreement of measurements obtained with different forms of a test
indicates whether measurements obtained with different forms of a test can be used interchangeably

Test-Retest Reliability

The consistency of repeated measurements in time
indicates stability (reliability) over time

Measurements of Reliability

1. Estimates of internal consistency
2. Reliability coefficient
3. Intraclass Correlation Coefficient (ICC)
4. Kappa Statistic
5. Standard Error of Measurement (SEM)

Estimate of Internal Consistency

A. Split-half reliability- Randomly divides items into 2 subsets then examine the consistency in total scores across the 2 subsets
B. Cronbach's Alpha- Determines if items in the scale are measuring the same construct.
Commonly used.
Can be directly compu

reliability coefficient

The amount of measurement variation attributed to error
0 = all measurement variability is due to error
1 = no error

intraclass correlation coefficient (ICC)

provides a reliability coefficient based on an analysis of variance
0 to 1 scale
commonly used since it is a comprehensive estimate of reliability
A. Used for PARAMETRIC data

Kappa Statistic

Examines the proportion of observed agreements in relation to the number of possible agreements
compares agreement to what might be expected by chance
good test to assess reliability among therapists when one is 'expert' and one is 'novice'.
Kappa level i

Standard Error of Measurement (SEM)

Measures stability of repeated measurements over time
measures observed score deviation from a true score when repeated measures are taken in a single client


the degree to which a useful interpretation can be inferred from a measurement
the degree to which a measurement measures what it is intended to measure
a valid test is also reliabile

Criterion-based validity

the correctness of an inferred interpretation can be tested by comparing a measurement with either a different measurement of data obtained from other forms of testing
target test = new test
criterion test = gold/reference standard
3 forms of criterion-ba

Concurrent Validity

An inferred interpretation is justified by comparing a measurement with supporting evidence that was obtained at approximately the same time as the measurement being validated

Predictive Validity

An inferred interpretation is justified by comparing a measurement with supporting evidence that is obtained at a later point in time
Examines the justification of using a measurement to say something about future events or conditions

Prescriptive Validity

An inferred interpretation of a measurement is the determination of the form of treatment a person is to receive
It is justified based on the successful outcome of the chosen treatment

Construct Validity

Ability of an instrument to measure an abstract concept (ie: health, pain, functional status)
The conceptual basis for using a measurement to make an inferred interpretation
Evidence is through logical argumentation based on theoretical and research evide

Content Validity

The extent to which a measurement is judged to reflect ALL meaningful elements of a construct and not any extraneous elements
ex: McGill pain questionnaire b/c it addresses: location, quality, time and intensity of pain, & cultural aspects of pain

Face Validity

The test characteristic can be directly observed
All or none phenomenon
Does the test appear to test what it is supposed to
least rigorous method to document a test's validity
ex: MMT, sensation, gait, balance

Reliability & Validity Review

1. A good measurement is reliable, but MOST importantly VALID
2. A valid measure, measures what is it intended to measure
3. Validity indicates the extent to which a tool measures a particular construct in a particular context
4. A measure may be valid to