scatter plot
a graph of data points
line of best fit
approximates the trend in data
model
sometimes a line or an equation used to represent data
Stroop Test
correlates a person's perception of words and colors for a list
matching list
the color of ink matches the color of the word
non-matching list
the color of ink does not match the color of the word
median
the middle of a set of data
median-median line
a method for calculating the line of best fit using the median
least squares method
a method of calculating the line of best fit using the distance each point is from the line of best fit
Pearson product-moment correlation coefficient
a measure of how well the regression equation fits the data
r
the correlation coefficient that varies from 0 to +/- 1
regression equation
the equation found to represent a set of data
causation
when one event causes a second event
necessary condition
a correlation needed for causation
sufficient condition
a correlation does not show causation
quadratic regression
used to model quadratic data
If we use knowledge of SAT scores to predict his or her GPA. wHAT IS THE PREdicTOR AND WHAT IS THE CRITERION?
sat IS PREDICTOR AND GPA IS CRITERION
How do we translate S2y'?
The sample variance of the Y scores around the Y'.
When r=0.0, the Y-intercept is equal to?
the mean of all the Y scores in the sample
If we can claim to account for .65 of the vvariance in Y scores by knowing a relationship, it means that?
We are on average, 65% more accurate at predicting Y' scores than we would be if we did not know the relationship.
In general, the greater the proportion of variance accounted for...
the more accurately we can predict the behaviour
If heterodasticity is present Sy' will be?
greater than the actual average error in predictions of Y for some scores and less than the actual average error for other X scores
The regression line can be thought of as a series of points representing?
all the possible Y' values associated with all possible X scores
Standard error of the mean is defined as?
Average spread of actual Y scores around the predicted Y scores
Linear regression is defined as the procedure for determining?
the best-fitting straight line in a linear relationship
When we square hte correlation coefficient to produce r2, the result is equal to the?
proportion of variance accounted for
The Y-intercept of a line is the?
value of Y at the point where the regression line crosses the Y axis
Suppose you have several different predictor variables and one criterion variable. all your variables are measured using interval or rations scales. What is the appropriate statistical test to use?
Multiple regression
The absence of random assignemnt in any study allows for what?
potential confounding
The absolute value of a correlation coefficient indicates the?
strength of the relationship
We should always draw a scatterplot of the data when we compute a correlation because hte scatterplot allows us to?
see the nature of the relationship between the two variables
The best-fitting straight line through a scatterplot is known as the?
regression line
When your scale correlates with other procedures or scales that are valid, it has__________ validity ?
Convergent
When your scale does not correlate with other unrelated procedures or scales it has ________validity?
discriminant
When the relationship between two variables is high (for example, r=.98) the variability in the Ys at each X is ____________ realtive to the overall variability of Y scores in the sample.
smaller
In general, a positive linear relationship means that?
as the values of one variable increase, there is a tendency for the values of the other variable to also increase.
Suppose you find a restriction of range in your study of IQ scores and school achievement at school. Restricting the range is likely to _____ the correlation coefficient.
decrease the size of
Whe consistency of participants responses to the same test at two different times determines?
test-retest reliability
The consistency of participant response on different versions of the same test determines?
split-half reliability
If we plot a scatterplot, and the data points form a shape that appears to be random dots and is far from forming a slanted straight line as possible, the correlation for the data is?
0.0: there is no relationship
THe defining formula for the Pearson correlation coefficient shows that it is the?
average correspondence of paired X and Y z-scores
Predictive validity
Extent to which a procedure is correlated with future behaviour
Concurrent validity
Extent to which a procedure is correlated with present behaviour
What procedure would be used to find out whether there is a relationship between SAT scores and GPA?
The Pearson correlation coefficient
The best-fitting line through a scatterplot is known as the?
regression line.
In general a positive relaitonship means that?
As one variable increases the other variable also increases
We should always draw a scatterplot of the data when we compute a correlation because it alows us to see?
the nature of the relationship between the two variables
r2
coefficient of determination
Linear regression is defined as?
the best fitting straight line in a linear relationship
In the fomula Y' what does Y" stand for {Y'= (b)(x) + a}?
predicted Y score
In this formula,{Y'= (b)(x) + a} what does the "a" stand for?
the value of Y that hits the Y axis
Define the Standard error of the estimate
the average spread of Y scores around predicted Y scores
What value of "r" would yield the smallest Sy'(standard error)?
the highest numbered "r
As the variability--differences--in Y scores at each X become larger, the relationship does what?
becomes weaker and results in a smaller correlation coefficient
Zero association means that?
No linear relationship is present
The larger the correlation coeficient (whether pos. or neg.), the stronger the relationship. Why?
The less the Ys are spread out at each X and the closer the data come to forming a straight line
What is another word for the degree of efficeincy in a relationship?
coefficient although it DOES NOT directly measure units of consistency
Define the purpose of computing a correlation coefficient.
Statistical technique for demonstrating the reliability and the validity of a measurement procedure in any experiment or correlational design.
What are the types of reliability that a correlation coefficient is used to show?
test-retest, inter-rater, split-half
inter-rater reliability
the consistency of ratings by any two raters
test-retest reliability
Test in which participants receive the same score when tested at different times
How high does a coefficient have to be in order to be considered reliable?
+.80 or higher
Face validity
Procedure is valid because it looks valid/Extent to which a measurement procedure appears to measure what it was intended to measure
Convergent Validity
Extent to which scores obtained from one procedure are positively correlated with scores obtained from another procedure that is already accepted
Discriminant validity
Extent to which scores obtained from one procedure are not correlated with scores from another procedure that measures OTHER variables or constructs.
Criterion validity
Extent to which a procedure correlates with a behavior.
Concurrent validity
Extent to which a procedure correlates with an individuals current behavior
Predictive validity
Extent to which a procedure correlates with an individuals future behavior
What is the range of a coefficient?
0-+/-1.0
What is the most common method of correlation coefficient?
Pearson correlation coefficient
Define the Pearson correlation coefficient
Corelation coeffieccient that describes the strength and type of a linear relationship between interval and ratio variables, symbolized by r.
Define the Spearman Rank order coefficient
The correlation coefficient that describes the linear relationship between pairs of ranked scores (ex: any two ordinal variables OR tied rank variables, symbolized by Rs
Tied rank variables
occcurs when two aprticipants receive the same ranking score in SPearman's rank coefficient, resolved by averaging the score and assigning it to both participant to correlate their scores.
Point biserial correlation coefficient
Describes the linear relationship between the scores from one continuous variable and one dichotomous variable (ex: correlating male/female with interval scores from a personality test).Can be used for one continuous interval or ration and one dichotomous
How does a restricted range affect a correlation coefficient?
reduces the accuracy, producing a smaller coefficient than if hte range were not restricted and leads to an underestimate of the degree of association between the two variables. Avoiding this increases power.
Why is the correlation coefficient important?
It is one number that allows us to envision and summarize the important information in a scatterplot, in terms of it's strength and direction.
what does a horizontal scatterplot, with a horizontal regression line indicate?
no relationship
The smaller the absolute value of the coefficient, the greater the ?
variability of the Ys at each X, the vertical width of the scatterplot, and the less accurately Y scores can be predicted from X
How can the power of a correlational design be increased?
Minimizing error variance and avoiding a restricted range, so that thelargest possible coefficient is obtained.
If it passes through the proper inferential procedure, a sample correlation coefficient is used to estimate what?
the corresponding population correlation coefficient: r=p,Rs estimates Ps, Rpb estimates Ppb.
Define linear regression
THe statistical procedure for using a relationship to predict scores aka the statistic that summarizes the linear relationship.It produces the line that summarzes the relationship
How is Y' pronounced
Y prime
What does the symbol Y' stand for
a predicted Y score. Our best prediction of the Y score at a corresponding X
Define regression line
straight line that summarizes the linear relationship in a scatterplot by,on average, passing through the center of the Y scores at each X and it consists of the predicted Y score-the Y'-for every possbile X
Why is "r" computed first?
to determine if a relationship exists. If r=0 their is no relationship
What is the importance of linear regression?
It is used to predict a individual's unknown Y score based on his/her X score from a correlated variable. Usually more external validity and more accurate description of the relationship.USed to predict unknown Y scores based on X scores from correlated v
Linear regression equation [(b)(x) + a]
equation that creates the straight line by producing a value of Y' at each X, define sthe line that summarzies the relationship. Describes it's slope and Y intercept.
Linear regression equation to calculate regression line points for scatterplot
Y'=[(b)(x) + a]
Y intercept equation
a=mean of Y- (b) (mean of x)
Slope equation
b
coefficient of determination
r2
SEE (Sy) is acronym for
Standard error of estimate which is the standardized difference between predicted Y' and actual Y scores
How do you calculate proportion of variance accounted for?
r2 which is also known as "coefficient of determination
When r=0, the standard erro of the estimate is at it's max. and that is equal to?
the standard deviation of all Y scores in the sample (Sy)
Stonger correlations produce what size SEE
smaller SEE
What does the equation r2 aka coefficient of determination aka proortion of variance indicate?
How important the realtionship is by comparing amount of error obtained using the regression equation for XY to errors without the regression equation for XY
what does Sy2 refer too?
Describes the error variance when using regressinon to predict Y scores, measures error in prediction.
Sr'
Standard error of estimate
Sr' definitional formula/average error
subtract Y' from Y and square each deviation/divide by N then find hte square root of that to get the error of the estimate
proportion of variance
is the amount we reduce errors in predicting Y scores when we use the relationship, compared too if we did not. Equals r2
a=
y-intercept
Y intercept
value of Y when it corsses the Y axis
Y' is the predicted Y score for what?
the corresponding X
The differences (and error) between Y and Y' is also summarized by what?
the variance of the Y scores around Y' (S2y)
If there is a large R there is a week or strong relationship?
stronger the relationship and a small value of Sy and S2y, because the Y scores are closer to Y', thus the smaller difference between Y and Y'
When r=0 what doe Sy and S2y equal?
Sy and S2y equal each other
When R= +/- 1 how much is the eror in predictions
Zero error and Sy' equals zero.
another term for r
Is the correlation coefficient
Proportion of variance accounted for indicates what?
The importance of a relationship
heteroscedasticity
An unequal spread of Y scores around the regression line (that is around the values of Y')
Homodasticity
An equal spread of Y scores around the regression line (that is the values of Y')
Symbol for Pearson correlation coefficeint
r symbol
Coefficient of alienation
1- r2
Sr
standard error ofthe estimate symbol
Sx
sample standard deviation symbol
S2x
sample variance symbol
sideways px
population standard deviation symbol
rs
Spearman correlation coefficient symbol
rpb
point-biserial correlation coefficient sign