Psyc 325 test 2


Big Picture: What Do We Want to Learn From Our Research?

1.Effect Size: What is the size of the effect?
- How big is the correlation between two measures? - How
different is the treatment group from the control group?
2.Precision: How precisely have we
pinned down the effect? - Confidence Intervals - Margin
of Error
3.Significance: Is the effect
significant? - Hypothesis Testing - Strength of the
evidence against the null hypothesis
4.Importance: Does the effect
matter? - Scientific Interpretation


Inferential Statistics

Predicting unknown parameters from known statistics
Select a sample Collect data
Calculate sample statistics Draw inferences about
population parameters


Common Statistics and Parameters:


How much coffee do students drink per week?

My guess (hypothesis) is 16 cups Take a random sample
of 25 students and measure how many cups they drink
Calculate the sample mean: Y?=20 What inferences can be
drawn about the population mean (? ) ???
- How much trust in our sample mean???
- The answer lies in the sampling distribution


Sampling Distributions

To make sensible inferences about a population parameter (e.g.,
a population mean), we need to understand the behavior of the
corresponding sample statistic (e.g., a sample mean)
The sampling distribution of the statistic describes its
behavior across repeated sampling Is it biased? An
underestimate? An overestimate? How much does it vary
across samples? Does it jump around a lot? What is its
shape? The sampling distribution is the linchpin of all
statistical inference


Sampling Distribution of the Mean Simulation

http://onlinestatbook.com/stat_sim/sampling_dist/index.html


Sampling Distribution of the Mean

Draw an infinite number of samples of a given
size and plot the means-> Sampling Distribution of the
Mean Obviously we can�t do that
Fortunately, we don�t have to because statistical
theory gives us the properties of the sampling distribution


Theoretical Properties of the Sampling Distribution of Means


What is the mean of the sampling distribution?
E(Y ? )=?
Y ? is an unbiased estimator of ?


How Much Do Sample Means Vary?


What is the Variance of the Sampling Distribution of the Mean?

The variance of the sampling distribution depends on two things:
The variance of the raw scores in the
population The sample size


Variance of the Sampling Distribution


Standard Deviation of the Sampling Distribution

#NAME?


Sampling Distribution of the Mean


What is the Shape of the Sampling Distribution?


The Central Limit Theorem: If the
sample size(n) is reasonably large, the
sampling distribution of the mean will be approximately NORMAL
regardless of the distribution of the raw scores
As n increases, the sampling distribution
approaches normality. For markedly non-normal distributions we need a
somewhat larger n before the sampling distribution becomes normal


Central Limit Theorem


Summary


The Normal Distribution

-> Unimodal, symmetric, bell-shaped, asymptotic tails


The 68-95-99.7 Rule

In a normal distribution:
68% fall within 1 SD of the mean 95% fall
within 2 SD of the mean 99.7% fall within 3 SD of the mean
The rule can easily be used to compute percentile ranks if the data
are normally distributed.


Z Scores in the Sampling Distribution


Why is the Sampling Distribution Important?

Sampling distribution of the mean is normal (CLT)
Normal Tables give us the probability of obtaining values above
or below any point in a standard normal distribution If we
know � (or have a hypothesis about �) we can convert our
sample mean to a Z score We can then refer to normal tables
to determine the precise probability of obtaining a sample mean like
the one we did


Classical Hypothesis Testing

Rarely know � But sometimes have a
hypothesis about � Construct a sampling
distribution around hypothesized � Determine
likelihood of obtaining a sample mean like ours, given
that the hypothesis is true


Logic of Hypothesis Testing


High vs. Low Probability Events

How do we define high vs. low probability?
- Outcomes with < 5% probability are considered rare
- Setting an alpha level (?) If sample
means like ours would occur < 5% of the time if the hypothesis was
true, we reject the hypothesis If they
would occur > 5% of the time we retain it.


Why is the Sampling Distribution Important?

Unless we know the something about the sampling distribution we
have no way of knowing whether our sample mean is a high vs. low
probability event Without the sampling distribution we
can�t test our hypothesis
With respect to hypothesis testing, the sampling
distribution is the only show in town


Z Test for a Population Mean

Use what we�ve just learned to test hypothesis that population mean
(?) weekly coffee consumption is 16 cups -> Z Test
Suppose we know that the population variance ?(??^2)=25
Obtain sample of 25 students (n=25) Sample mean (Y
? )=20 cups
Given the data, should we reject or retain the hypothesis?
Depends on whether sample mean is a high or low
probability outcome, if the hypothesis is true


Z Test for a Population Mean

4 > 1.96 Reject hypothesis that mean coffee
consumption is 16, p < .05


Seven Steps in Hypothesis Testing

Research Problem Statistical Hypotheses
Assumptions Decision Rule Calculate Test
statistic Decision Interpretation


7 Steps in Hypothesis Testing


7 Steps in Hypothesis Testing


7 Steps in Hypothesis Testing


Why Not Test H1 Directly?

Alternative hypothesis often more interesting than the null hypothesis
Why can�t we test it directly?
H1 is too vague
- H1 doesn�t specify a precise value for the population mean
- Can�t generate a unique sampling distribution
In contrast, H0 (the null hypothesis) is extremely precise
- Specifies a population mean
- Can generate a unique sampling distribution
Indirectly test H1
- Frame conclusions in terms of retaining or rejecting H0
- By implication has consequences for the tenability of H1


What Could Possibly Go Wrong?


Factors Affecting Type I Error


Factors Affecting Type II Errors

1.Effect Size
2.Sample Size
3.Alpha


Statistical vs. Theoretical Significance

Increasing n -> more powerful test But:
Large sample sizes can make a test �too� sensitive
-> May end up rejecting H0 even when true distribution
is only trivially different from null distribution
Statistical Significance ? Theoretical Significance
Note: We always like large samples but we need to be
careful to distinguish statistical and theoretical significance


Measuring Effect Size

How different is our sample mean (20 cups) from the
hypothesized value (16 cups)? We could just look at the
difference between means (4 cups) but this difference often depends
on scale of measurement
Instead we express this difference relative to the
raw standard deviation


Precision: Confidence Intervals


Confidence Intervals Around the Mean


Confidence Intervals Around the Mean


Interpretation of CIs

Construct sampling distribution of the mean For each
sample construct a 95% CI How many of the CIs contain the
population mean?
Answer = 95%


Interpretation of CIs

In practice, almost never know the value of the population mean
(?) Hence, never know for sure whether
our particular CI contains ? All
we know is that 95% of CIs contain ? and 5% do not
It is in that sense that we can be 95% confident that our CI
contains ?


CIs and Hypothesis Testing


Big Picture: What Do We Want to Learn From Our Research?


Reporting p
-values, CIs, and Effect Sizes

One sample t test


Z Tests vs. T Tests


One Sample T

Procedures for t test similar to those for Z test BUT:
Use an estimate of population variance rather than actual
pop parameter
The sample variance (s 2 ) is an
unbiased estimator of ? 2


Sampling Distribution of the Mean


Hypothesis Tests


T Distribution


Degrees of Freedom


Degrees of Freedom


Example One Sample T Test


Example One Sample T Test


SPSS Output

Top bar descriptive data
Mean difference between sample mean and hypothesised mean


Confidence Intervals for t


Effect Size

One sample z cohens d difference between sample and population mean/
estimated population sample deviation
Lowest effect size 0
No upper bound on cohens d


Reporting Results for One Sample T-Test


What Do We Want to Learn From Our Research?


Statistical Inference With More Than One Sample


Statistical Inference With More Than One Sample


Independent Samples T Test

Are parents more conservative than adolescents?
1.Take random sample of 8 parents and a random sample of 8 adolescents
2.Give them each a conservatism scale
3. Find that parents in the sample are more conservative
than adolescents in the sample
Is that a genuine difference or just normal variability
across samples?
-> Conduct an independent samples t test


Statistical Hypotheses


Assumptions


Sampling Distribution of Difference Between Means


Sampling Distribution of Difference Between Means

Estimate the population variance from each sample Pool
the two estimates to get a better estimate
3.Estimate the variance and standard deviation of the
sampling distribution using the pooled variance

Estimating


One Sample T vs. Independent Samples T


Independent Samples T


IST Example

SPSS Output

Low precision due to range of CI


Confidence Intervals for IST


Effect Size


APA Format


What Do We Want to Learn From Our Research?


Statistical Inference With More Than One Sample


Dependent Samples T Test (DST)


DST


DST


DST Procedures


DST


DST Example


SPSS Output

Adolescent views highly correlated with parents
-Because the samples are related/ dependent on each other


Confidence Intervals and Effect Size


APA Format


What Do We Want to Learn From Our Research?


Advantage of DST


Advantage of DST

Only works with decent correlation between the pairs


Disadvantages of DST


Analysis of Variance


Many Varieties of ANOVA
Between Groups Oneway Anova (Single factor
ANOVA)-> e.g., Comparing 3 different therapies for
depression Between Groups Factorial Anova->
e.g., Age Group (2) X Therapy (3) Repeated Measures Oneway
Anova-> e.g., Effect of alcohol on performance at 3
different time delays Mixed Factorial Designs (Between
and Within Factors)-> e.g., Treatment Group (2) X Time
Delay (3)


Why Not Do Multiple T Tests?

Two problems:
1.The more tests the greater the likelihood of making at
least one Type I error
2.Not very powerful
-> Analysis of variance controls the errors and is more powerful


Analysis of Variance (ANOVA)


Case 1: Diffs among means but huge variability within samples

Case 2: Same diffs but much less variability within samples
1 � no alcohol
2- 2
3 -3
Measure errors on cognitive task


Conceptual Basis of ANOVA

More confident that samples are drawn from different
populations in Case 2 Comparison of differences among means
with variability within groups If diffs are great relative
to underlying variability then unlikely that population means are
the same Does signal stands out from the underlying
noise?
-> Conceptual basis of ANOVA


ANOVA


Goal of ANOVA

Estimate of population variance sigma squared


Within Group Variability

Independent Samples T: 2 good within group estimates of the
population variance
-> Pool to get better estimate of ?^2
ANOVA: J different within group estimates of the
population variance
-> Pool these to get better estimate of ?^2


Between Group Variability


F Ratio


F Ratio


F Ratio


F Distributions

As with t, an infinite number of F
distributions differing in shape depending on df More
complex because with F there are 2 kinds of df
�dfbetween = J-1 = Number of Groups -1
�dfwithin = N-J = Total Sample Size � Number of Groups


Effects of Depth of Processing on Memory


Depth of Processing

Analysis

One-Way ANOVA 5 levels of the independent variable
(Condition) Dependent variable: Words recalled

SPSS Output


Effect Size

D tells you how many sd the means are apart
Percent of variable attributable to the difference between means
SS between/SS Total


APA Format


What Do We Want to Learn From Our Research?


F and T


Cohen�s d and Eta-Squared


Multiple Comparisons


Depth of Processing


SPSS Output: Depth of Processing


Types of Comparisons


A Priori vs. Post Hoc Contrasts


Simple vs. Complex Contrasts


Orthogonal vs. Non-Orthogonal Contrasts


Error Rates

J conditions (Different condition)
Per comparision: type 1 error, alpha level, for any single
comparison the likelihood of a type 1 error
Familywise: Likelihood of a type 1 error across a family of
comparisons, more comparisons higher likelihood


Familywise Error Rate

Familywise error rate
Number is likelihood of type 1 error
Fw approxiamtely equal to number of comparisons x alpha


Comparisons

Planned Contrasts

A logically planned set of contrasts for the depth of processing data:
1.Is there a difference between the shallow conditions?
-> Counting vs Rhyming
2.Is there a difference between the deep conditions?
-> Adjective vs Imagery
3.Is there a difference between shallow and deep
conditions? -> (Counting and Rhyming) vs (Adjective and Imagery)
4.Is there a difference between Intentional and Incidental
conditions? -> Intentional vs (Counting, Rhyming, Adjective,
and Imagery)
1 and 2 are simple contrasts, 3 and 4 are complex contrasts


SPSS Output: Planned Contrasts


Depth of Processing


Trend Analysis


Relation Between Digit Span and Age


SPSS: Oneway ANOVA

Relation Between Digit Span and Age


SPSS: Trend Analysis

Relation Between Digit Span and Age


Post Hoc Tests


Factorial ANOVA (Between Groups)

So far: Single Factor ANOVA
-> One factor with 2 or more levels
-> 5 depth of processing conditions
Often we want to look at multiple factors in a single experiment:
-> Add a second factor (hi vs. low IQ)
-> 5 (depth of processing conditions) X 2 (IQ) factorial ANOVA
-> Add a third factor (short vs long word lists)
-> 5 (condition) X 2 (IQ) X 2 (list length) factorial ANOVA
-> Add a fourth, fifth, sixth, etc.


A Simple 2 X 2 Factorial ANOVA

Effects of smoking (Smoker vs. Non-smoker) and gender (Male vs.
Female) on pulse rate after exercise
Two Factors (smoking and gender) -> Two-way
ANOVA-> 2 X 2 ANOVA Examine main effects
of smoking and gender Examine
interaction between smoking and gender


How Might the Interaction Turn Out?


Interactions

2.Interactions
When lines are not parallel ->
Interaction: -> Effects of one factor depend on levels
of the other
-> Such interactions may or may not be significant


Interactions

Examples:
Gender effect for non smokers and for smokers but effect
is greater for smokers


Interactions

Gender effect but only for non-smokers


Interactions

Gender effect but in opposite directions for smokers and non
smokers
For non-smokers, female pulse > male pulse
For smokers, male pulse > female pulse A
�crossover� interaction


Exercise Data

40 participants 20 women, 20 men 20 smokers,
20 non-smokers Pule rate measured after 15 minutes
exercise 2 X 2 ANOVA


SPSS Line Chart


2 X 2 ANOVA: SPSS Output


A 3 X 2 Factorial ANOVA


Two-Way ANOVA


Statistical Hypotheses


Assumptions


Sources of Variability: Sums of Squares


Degrees of Freedom


Variance Estimates: Mean Squares


F Tests


Teaching Method Data: SPSS Source Table

Main Effect of Method: F(2,24) = 5.30, p =.012, ?p^2=.31
Main Effect of Experience: F(1,24) = 7.06, p
=.014, ?p^2=.23
Interaction of Method and Experience: F(2,24) = 5.93,
p =.008, ?p^2=.33


Two-Way ANOVA

Evidence of interaction between method and experience
-> Method affects performance but it depends on experience
-> Experience affects performance but it depends on method


Rationale for the F
Test


Effect Size in Factorial ANOVA


Why Use Factorial ANOVA?


Analyzing Interactions: Two-Way ANOVA


Analyzing Interactions: Simple Effects


Simple Effects of Experience

1. No significant Effect of Experience in Lecture Condition:
F(1,24) = 1.51, p =.232, ?_p^2=.06
2. An Effect of Experience in Discussion Condition:
F(1,24) = 11.39, p =.003, ?_p^2=.32
3. An Effect of Experience in Study Condition: F(1,24)
= 6.02, p =.022, ?_p^2=.20


Simple Effects of Experience

Simple effects of experience for Discussion and Study but not Lecture


Simple Effects of Method

An Effect of Method in the No Experience Group:
F(2,24) = 7.45, p =.033, ?_p^2=.38 An
Effect of Method in the Experience Group: F(2,24) = 3.78,
p =.037, ?_p^2=.24
Note: These are single factor ANOVAS within each experience group
� Need to do post hoc follow up tests to explore these effects


Simple Effects of Method

Simple Effects of Method in Both Groups
But different effects:
For the No Experience Group (Blue) Lecture looks best
For the Experience Group (Red) Discussion look best
Post Hoc Tests will show these effects


Interpreting Main Effects in Presence of Interactions


Exercise Data Hypothetical Case 1
An interaction between 2 factors such that effect of smoking is
present only for females
�BUT: Overall smokers have higher pulse than non-smokers
�Interpretation of main effect must be qualified by
presence of the interaction


Interpreting Main Effects in Presence of Interactions


Exercise Data Hypothetical Case 2
Interaction between the 2 factors
-> Smoking effect for both males and females
-> BUT: Stronger for females
-> Main effect not (fully) qualified by
presence of interaction


Repeated Measures Designs (Within Subjects)

1.Dependent Samples T Test -> 2 Conditions->
Compare depression before and immediately after Cognitive-Behavioural Therapy
2.One-Way Repeated Measures ANOVA -> 2 or more
conditions-> Compare depression before therapy, immediately
after CBT, and 6 weeks later
3.Two-Way Repeated Measures ANOVA (Factorial) -> 2
repeated measures factors each with 2 or more levels->
Compare depression before, immediately after therapy, and 6 weeks
later for both CBT and acupuncture-> Participants receive
both CBT and acupuncture in counterbalanced order
Two-Way Mixed Designs -> One within factor
(repeated measures) with 2 or more levels-> One between
factor with 2 or more levels-> As in 3 above, but
different participants in the CBT and acupuncture conditions


Advantage of Repeated Measures Designs

Remember the Dependent Samples T vs the Independent Samples T
DST is very often more powerful than IST because individual
differences are controlled Signal stands out more clearly
relative to the noise The same is true for repeated
measures designs more generally Greater power And
cheaper, more efficient! Recommended unless there are carry
over effects


The Logic of Repeated Measures ANOVA

The F-ratio has the general structure:

F = Variability between treatments
Variability expected by chance
In repeated measures designs individual differences are removed:
Variability between treatments

F = (with individual differences removed)
Variability expected by chance
(with individual differences removed)


The Logic of Repeated Measures ANOVA


Dependent Samples T Test

A repeated measures design with 2 levels Can analyse
as a dependent samples T Test or as a one-way repeated measures
ANOVA Identical p values
F = t 2 Same as relation
between independent samples T Test and one-way independent groups
ANOVA


One-Way Repeated Measures ANOVA


Effect of Sleep Deprivation on Cognitive Performance

12 hours vs. 18 hours vs. 24 hours:
One Independent Variable with 3 levels 20
participants Each participant does each condition
(counterbalanced) One Dependent Variable (# of errors)
-> One-way repeated-measures ANOVA


One-Way Repeated-Measures ANOVA

Assumptions:
1.Independence -> OK if it�s a random sample
2.Normality-> Robust unless sample is small
3.Sphericity
Variances of the differences between combinations of levels
are equal Only applies if you have 3+ levels NOT
Robust to violations of sphericity Use Mauchly�s Test to
assess whether sphericity holds If not, use
Greenhouse-Geisser correction provided by SPSS


One-Way Repeated Measures ANOVA


Mauchly�s Test of Sphericity Assumption

Mauchly�s Test not significant -> No evidence of sphericity violation
Sphericity effect here is not significant


One-Way Repeated Measures ANOVA

Use Sphericity Assumed F if Mauchly's test NOT
significant-> Significant effect of sleep deprivation,
F(2, 38) = 90.57, p<.0005, partial eta-squared
= .83 If Mauchly�s test had been significant, use
Greenhouse-Geisser correction -> Significant effect of
sleep deprivation, F(1.83, 34.73) = 90.57,
p<.0005)
SPSS output gets more complex in repeated measures
-Gives results sphericity assumed
-Also gives other measure which are alternative ways of handling
sphericity violation
-Focus on top two
-GG is adjusting the degrees of freedom so more conservative than
if they�d be in sphericity assumed test


Follow-Up Tests


Significant Linear and Quadratic Trends


Post Hoc Tests

Comparing each condtion against every other condition