# Psyc 325 test 2

Big Picture: What Do We Want to Learn From Our Research?

1.Effect Size: What is the size of the effect?
- How big is the correlation between two measures? - How
different is the treatment group from the control group?
2.Precision: How precisely have we
pinned down the effect? - Confidence Intervals - Margin
of Error
3.Significance: Is the effect
significant? - Hypothesis Testing - Strength of the
evidence against the null hypothesis
4.Importance: Does the effect
matter? - Scientific Interpretation

Inferential Statistics

Predicting unknown parameters from known statistics
Select a sample Collect data
Calculate sample statistics Draw inferences about
population parameters

Common Statistics and Parameters:

How much coffee do students drink per week?

My guess (hypothesis) is 16 cups Take a random sample
of 25 students and measure how many cups they drink
Calculate the sample mean: Y?=20 What inferences can be
drawn about the population mean (? ) ???
- How much trust in our sample mean???
- The answer lies in the sampling distribution

Sampling Distributions

To make sensible inferences about a population parameter (e.g.,
a population mean), we need to understand the behavior of the
corresponding sample statistic (e.g., a sample mean)
The sampling distribution of the statistic describes its
behavior across repeated sampling Is it biased? An
underestimate? An overestimate? How much does it vary
across samples? Does it jump around a lot? What is its
shape? The sampling distribution is the linchpin of all
statistical inference

Sampling Distribution of the Mean Simulation

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Sampling Distribution of the Mean

Draw an infinite number of samples of a given
size and plot the means-> Sampling Distribution of the
Mean Obviously we can�t do that
Fortunately, we don�t have to because statistical
theory gives us the properties of the sampling distribution

Theoretical Properties of the Sampling Distribution of Means

What is the mean of the sampling distribution?
E(Y ? )=?
Y ? is an unbiased estimator of ?

How Much Do Sample Means Vary?

What is the Variance of the Sampling Distribution of the Mean?

The variance of the sampling distribution depends on two things:
The variance of the raw scores in the
population The sample size

Variance of the Sampling Distribution

Standard Deviation of the Sampling Distribution

#NAME?

Sampling Distribution of the Mean

What is the Shape of the Sampling Distribution?

The Central Limit Theorem: If the
sample size(n) is reasonably large, the
sampling distribution of the mean will be approximately NORMAL
regardless of the distribution of the raw scores
As n increases, the sampling distribution
approaches normality. For markedly non-normal distributions we need a
somewhat larger n before the sampling distribution becomes normal

Central Limit Theorem

Summary

The Normal Distribution

-> Unimodal, symmetric, bell-shaped, asymptotic tails

The 68-95-99.7 Rule

In a normal distribution:
68% fall within 1 SD of the mean 95% fall
within 2 SD of the mean 99.7% fall within 3 SD of the mean
The rule can easily be used to compute percentile ranks if the data
are normally distributed.

Z Scores in the Sampling Distribution

Why is the Sampling Distribution Important?

Sampling distribution of the mean is normal (CLT)
Normal Tables give us the probability of obtaining values above
or below any point in a standard normal distribution If we
know � (or have a hypothesis about �) we can convert our
sample mean to a Z score We can then refer to normal tables
to determine the precise probability of obtaining a sample mean like
the one we did

Classical Hypothesis Testing

Rarely know � But sometimes have a
hypothesis about � Construct a sampling
distribution around hypothesized � Determine
likelihood of obtaining a sample mean like ours, given
that the hypothesis is true

Logic of Hypothesis Testing

High vs. Low Probability Events

How do we define high vs. low probability?
- Outcomes with < 5% probability are considered rare
- Setting an alpha level (?) If sample
means like ours would occur < 5% of the time if the hypothesis was
true, we reject the hypothesis If they
would occur > 5% of the time we retain it.

Why is the Sampling Distribution Important?

Unless we know the something about the sampling distribution we
have no way of knowing whether our sample mean is a high vs. low
probability event Without the sampling distribution we
can�t test our hypothesis
With respect to hypothesis testing, the sampling
distribution is the only show in town

Z Test for a Population Mean

Use what we�ve just learned to test hypothesis that population mean
(?) weekly coffee consumption is 16 cups -> Z Test
Suppose we know that the population variance ?(??^2)=25
Obtain sample of 25 students (n=25) Sample mean (Y
? )=20 cups
Given the data, should we reject or retain the hypothesis?
Depends on whether sample mean is a high or low
probability outcome, if the hypothesis is true

Z Test for a Population Mean

4 > 1.96 Reject hypothesis that mean coffee
consumption is 16, p < .05

Seven Steps in Hypothesis Testing

Research Problem Statistical Hypotheses
Assumptions Decision Rule Calculate Test
statistic Decision Interpretation

7 Steps in Hypothesis Testing

7 Steps in Hypothesis Testing

7 Steps in Hypothesis Testing

Why Not Test H1 Directly?

Alternative hypothesis often more interesting than the null hypothesis
Why can�t we test it directly?
H1 is too vague
- H1 doesn�t specify a precise value for the population mean
- Can�t generate a unique sampling distribution
In contrast, H0 (the null hypothesis) is extremely precise
- Specifies a population mean
- Can generate a unique sampling distribution
Indirectly test H1
- Frame conclusions in terms of retaining or rejecting H0
- By implication has consequences for the tenability of H1

What Could Possibly Go Wrong?

Factors Affecting Type I Error

Factors Affecting Type II Errors

1.Effect Size
2.Sample Size
3.Alpha

Statistical vs. Theoretical Significance

Increasing n -> more powerful test But:
Large sample sizes can make a test �too� sensitive
-> May end up rejecting H0 even when true distribution
is only trivially different from null distribution
Statistical Significance ? Theoretical Significance
Note: We always like large samples but we need to be
careful to distinguish statistical and theoretical significance

Measuring Effect Size

How different is our sample mean (20 cups) from the
hypothesized value (16 cups)? We could just look at the
difference between means (4 cups) but this difference often depends
on scale of measurement
Instead we express this difference relative to the
raw standard deviation

Precision: Confidence Intervals

Confidence Intervals Around the Mean

Confidence Intervals Around the Mean

Interpretation of CIs

Construct sampling distribution of the mean For each
sample construct a 95% CI How many of the CIs contain the
population mean?

Interpretation of CIs

In practice, almost never know the value of the population mean
(?) Hence, never know for sure whether
our particular CI contains ? All
we know is that 95% of CIs contain ? and 5% do not
It is in that sense that we can be 95% confident that our CI
contains ?

CIs and Hypothesis Testing

Big Picture: What Do We Want to Learn From Our Research?

Reporting p
-values, CIs, and Effect Sizes

One sample t test

Z Tests vs. T Tests

One Sample T

Procedures for t test similar to those for Z test BUT:
Use an estimate of population variance rather than actual
pop parameter
The sample variance (s 2 ) is an
unbiased estimator of ? 2

Sampling Distribution of the Mean

Hypothesis Tests

T Distribution

Degrees of Freedom

Degrees of Freedom

Example One Sample T Test

Example One Sample T Test

SPSS Output

Top bar descriptive data
Mean difference between sample mean and hypothesised mean

Confidence Intervals for t

Effect Size

One sample z cohens d difference between sample and population mean/
estimated population sample deviation
Lowest effect size 0
No upper bound on cohens d

Reporting Results for One Sample T-Test

What Do We Want to Learn From Our Research?

Statistical Inference With More Than One Sample

Statistical Inference With More Than One Sample

Independent Samples T Test

Are parents more conservative than adolescents?
1.Take random sample of 8 parents and a random sample of 8 adolescents
2.Give them each a conservatism scale
3. Find that parents in the sample are more conservative
than adolescents in the sample
Is that a genuine difference or just normal variability
across samples?
-> Conduct an independent samples t test

Statistical Hypotheses

Assumptions

Sampling Distribution of Difference Between Means

Sampling Distribution of Difference Between Means

Estimate the population variance from each sample Pool
the two estimates to get a better estimate
3.Estimate the variance and standard deviation of the
sampling distribution using the pooled variance

Estimating

One Sample T vs. Independent Samples T

Independent Samples T

IST Example

SPSS Output

Low precision due to range of CI

Confidence Intervals for IST

Effect Size

APA Format

What Do We Want to Learn From Our Research?

Statistical Inference With More Than One Sample

Dependent Samples T Test (DST)

DST

DST

DST Procedures

DST

DST Example

SPSS Output

Adolescent views highly correlated with parents
-Because the samples are related/ dependent on each other

Confidence Intervals and Effect Size

APA Format

What Do We Want to Learn From Our Research?

Only works with decent correlation between the pairs

Analysis of Variance

Many Varieties of ANOVA
Between Groups Oneway Anova (Single factor
ANOVA)-> e.g., Comparing 3 different therapies for
depression Between Groups Factorial Anova->
e.g., Age Group (2) X Therapy (3) Repeated Measures Oneway
Anova-> e.g., Effect of alcohol on performance at 3
different time delays Mixed Factorial Designs (Between
and Within Factors)-> e.g., Treatment Group (2) X Time
Delay (3)

Why Not Do Multiple T Tests?

Two problems:
1.The more tests the greater the likelihood of making at
least one Type I error
2.Not very powerful
-> Analysis of variance controls the errors and is more powerful

Analysis of Variance (ANOVA)

Case 1: Diffs among means but huge variability within samples

Case 2: Same diffs but much less variability within samples
1 � no alcohol
2- 2
3 -3
Measure errors on cognitive task

Conceptual Basis of ANOVA

More confident that samples are drawn from different
populations in Case 2 Comparison of differences among means
with variability within groups If diffs are great relative
to underlying variability then unlikely that population means are
the same Does signal stands out from the underlying
noise?
-> Conceptual basis of ANOVA

ANOVA

Goal of ANOVA

Estimate of population variance sigma squared

Within Group Variability

Independent Samples T: 2 good within group estimates of the
population variance
-> Pool to get better estimate of ?^2
ANOVA: J different within group estimates of the
population variance
-> Pool these to get better estimate of ?^2

Between Group Variability

F Ratio

F Ratio

F Ratio

F Distributions

As with t, an infinite number of F
distributions differing in shape depending on df More
complex because with F there are 2 kinds of df
�dfbetween = J-1 = Number of Groups -1
�dfwithin = N-J = Total Sample Size � Number of Groups

Effects of Depth of Processing on Memory

Depth of Processing

Analysis

One-Way ANOVA 5 levels of the independent variable
(Condition) Dependent variable: Words recalled

SPSS Output

Effect Size

D tells you how many sd the means are apart
Percent of variable attributable to the difference between means
SS between/SS Total

APA Format

What Do We Want to Learn From Our Research?

F and T

Cohen�s d and Eta-Squared

Multiple Comparisons

Depth of Processing

SPSS Output: Depth of Processing

Types of Comparisons

A Priori vs. Post Hoc Contrasts

Simple vs. Complex Contrasts

Orthogonal vs. Non-Orthogonal Contrasts

Error Rates

J conditions (Different condition)
Per comparision: type 1 error, alpha level, for any single
comparison the likelihood of a type 1 error
Familywise: Likelihood of a type 1 error across a family of
comparisons, more comparisons higher likelihood

Familywise Error Rate

Familywise error rate
Number is likelihood of type 1 error
Fw approxiamtely equal to number of comparisons x alpha

Comparisons

Planned Contrasts

A logically planned set of contrasts for the depth of processing data:
1.Is there a difference between the shallow conditions?
-> Counting vs Rhyming
2.Is there a difference between the deep conditions?
-> Adjective vs Imagery
3.Is there a difference between shallow and deep
conditions? -> (Counting and Rhyming) vs (Adjective and Imagery)
4.Is there a difference between Intentional and Incidental
conditions? -> Intentional vs (Counting, Rhyming, Adjective,
and Imagery)
1 and 2 are simple contrasts, 3 and 4 are complex contrasts

SPSS Output: Planned Contrasts

Depth of Processing

Trend Analysis

Relation Between Digit Span and Age

SPSS: Oneway ANOVA

Relation Between Digit Span and Age

SPSS: Trend Analysis

Relation Between Digit Span and Age

Post Hoc Tests

Factorial ANOVA (Between Groups)

So far: Single Factor ANOVA
-> One factor with 2 or more levels
-> 5 depth of processing conditions
Often we want to look at multiple factors in a single experiment:
-> Add a second factor (hi vs. low IQ)
-> 5 (depth of processing conditions) X 2 (IQ) factorial ANOVA
-> Add a third factor (short vs long word lists)
-> 5 (condition) X 2 (IQ) X 2 (list length) factorial ANOVA
-> Add a fourth, fifth, sixth, etc.

A Simple 2 X 2 Factorial ANOVA

Effects of smoking (Smoker vs. Non-smoker) and gender (Male vs.
Female) on pulse rate after exercise
Two Factors (smoking and gender) -> Two-way
ANOVA-> 2 X 2 ANOVA Examine main effects
of smoking and gender Examine
interaction between smoking and gender

How Might the Interaction Turn Out?

Interactions

2.Interactions
When lines are not parallel ->
Interaction: -> Effects of one factor depend on levels
of the other
-> Such interactions may or may not be significant

Interactions

Examples:
Gender effect for non smokers and for smokers but effect
is greater for smokers

Interactions

Gender effect but only for non-smokers

Interactions

Gender effect but in opposite directions for smokers and non
smokers
For non-smokers, female pulse > male pulse
For smokers, male pulse > female pulse A
�crossover� interaction

Exercise Data

40 participants 20 women, 20 men 20 smokers,
20 non-smokers Pule rate measured after 15 minutes
exercise 2 X 2 ANOVA

SPSS Line Chart

2 X 2 ANOVA: SPSS Output

A 3 X 2 Factorial ANOVA

Two-Way ANOVA

Statistical Hypotheses

Assumptions

Sources of Variability: Sums of Squares

Degrees of Freedom

Variance Estimates: Mean Squares

F Tests

Teaching Method Data: SPSS Source Table

Main Effect of Method: F(2,24) = 5.30, p =.012, ?p^2=.31
Main Effect of Experience: F(1,24) = 7.06, p
=.014, ?p^2=.23
Interaction of Method and Experience: F(2,24) = 5.93,
p =.008, ?p^2=.33

Two-Way ANOVA

Evidence of interaction between method and experience
-> Method affects performance but it depends on experience
-> Experience affects performance but it depends on method

Rationale for the F
Test

Effect Size in Factorial ANOVA

Why Use Factorial ANOVA?

Analyzing Interactions: Two-Way ANOVA

Analyzing Interactions: Simple Effects

Simple Effects of Experience

1. No significant Effect of Experience in Lecture Condition:
F(1,24) = 1.51, p =.232, ?_p^2=.06
2. An Effect of Experience in Discussion Condition:
F(1,24) = 11.39, p =.003, ?_p^2=.32
3. An Effect of Experience in Study Condition: F(1,24)
= 6.02, p =.022, ?_p^2=.20

Simple Effects of Experience

Simple effects of experience for Discussion and Study but not Lecture

Simple Effects of Method

An Effect of Method in the No Experience Group:
F(2,24) = 7.45, p =.033, ?_p^2=.38 An
Effect of Method in the Experience Group: F(2,24) = 3.78,
p =.037, ?_p^2=.24
Note: These are single factor ANOVAS within each experience group
� Need to do post hoc follow up tests to explore these effects

Simple Effects of Method

Simple Effects of Method in Both Groups
But different effects:
For the No Experience Group (Blue) Lecture looks best
For the Experience Group (Red) Discussion look best
Post Hoc Tests will show these effects

Interpreting Main Effects in Presence of Interactions

Exercise Data Hypothetical Case 1
An interaction between 2 factors such that effect of smoking is
present only for females
�BUT: Overall smokers have higher pulse than non-smokers
�Interpretation of main effect must be qualified by
presence of the interaction

Interpreting Main Effects in Presence of Interactions

Exercise Data Hypothetical Case 2
Interaction between the 2 factors
-> Smoking effect for both males and females
-> BUT: Stronger for females
-> Main effect not (fully) qualified by
presence of interaction

Repeated Measures Designs (Within Subjects)

1.Dependent Samples T Test -> 2 Conditions->
Compare depression before and immediately after Cognitive-Behavioural Therapy
2.One-Way Repeated Measures ANOVA -> 2 or more
conditions-> Compare depression before therapy, immediately
after CBT, and 6 weeks later
3.Two-Way Repeated Measures ANOVA (Factorial) -> 2
repeated measures factors each with 2 or more levels->
Compare depression before, immediately after therapy, and 6 weeks
later for both CBT and acupuncture-> Participants receive
both CBT and acupuncture in counterbalanced order
Two-Way Mixed Designs -> One within factor
(repeated measures) with 2 or more levels-> One between
factor with 2 or more levels-> As in 3 above, but
different participants in the CBT and acupuncture conditions

Advantage of Repeated Measures Designs

Remember the Dependent Samples T vs the Independent Samples T
DST is very often more powerful than IST because individual
differences are controlled Signal stands out more clearly
relative to the noise The same is true for repeated
measures designs more generally Greater power And
cheaper, more efficient! Recommended unless there are carry
over effects

The Logic of Repeated Measures ANOVA

The F-ratio has the general structure:

F = Variability between treatments
Variability expected by chance
In repeated measures designs individual differences are removed:
Variability between treatments

F = (with individual differences removed)
Variability expected by chance
(with individual differences removed)

The Logic of Repeated Measures ANOVA

Dependent Samples T Test

A repeated measures design with 2 levels Can analyse
as a dependent samples T Test or as a one-way repeated measures
ANOVA Identical p values
F = t 2 Same as relation
between independent samples T Test and one-way independent groups
ANOVA

One-Way Repeated Measures ANOVA

Effect of Sleep Deprivation on Cognitive Performance

12 hours vs. 18 hours vs. 24 hours:
One Independent Variable with 3 levels 20
participants Each participant does each condition
(counterbalanced) One Dependent Variable (# of errors)
-> One-way repeated-measures ANOVA

One-Way Repeated-Measures ANOVA

Assumptions:
1.Independence -> OK if it�s a random sample
2.Normality-> Robust unless sample is small
3.Sphericity
Variances of the differences between combinations of levels
are equal Only applies if you have 3+ levels NOT
Robust to violations of sphericity Use Mauchly�s Test to
assess whether sphericity holds If not, use
Greenhouse-Geisser correction provided by SPSS

One-Way Repeated Measures ANOVA

Mauchly�s Test of Sphericity Assumption

Mauchly�s Test not significant -> No evidence of sphericity violation
Sphericity effect here is not significant

One-Way Repeated Measures ANOVA

Use Sphericity Assumed F if Mauchly's test NOT
significant-> Significant effect of sleep deprivation,
F(2, 38) = 90.57, p<.0005, partial eta-squared
= .83 If Mauchly�s test had been significant, use
Greenhouse-Geisser correction -> Significant effect of
sleep deprivation, F(1.83, 34.73) = 90.57,
p<.0005)
SPSS output gets more complex in repeated measures
-Gives results sphericity assumed
-Also gives other measure which are alternative ways of handling
sphericity violation
-Focus on top two
-GG is adjusting the degrees of freedom so more conservative than
if they�d be in sphericity assumed test

Follow-Up Tests

Significant Linear and Quadratic Trends

Post Hoc Tests

Comparing each condtion against every other condition