The best line has the least amount of
squared residuals
R
correlation coefficient
What is the correlation coefficient?
a number that gives the strength and direction of the linear regression equation
T or F: it makes no difference which variable you call Y and which variable you call X
true
T or F: both variables must be numerical
true
T or F: R will change is you change the units of X or Y
false, R has to units so it will not change
Positive R indicates
a positive association (slope) between the variables
Negative R indicates
a negative association (slope) between the variables
R is always between
-1 and 1
T or F: correlation only works for non-linear relationships
false, it only works for linear relationships. if the relationship is non-linear, R will give a non-valid result
T or F: R is affected by outliers
true
Slope
R(Sy/Sx)
Univariate data
one variable
Bivariate
two variables
Direct (or increasing) relationship
an increase in one variable is associated with an increase in the other variable
R > 0
Inverse (or decreasing) relationship
an increase in one variable is associated with a decrease in the other variable
R < 0
T or F: correlation means causality
false, correlation does NOT mean causality
if R is near 0
weak assiciation
if R is near positive or negative 1
strong association
Linear Regression Model
Y = bo + b1X
bo
sum of XY - n (Xbar)(Ybar) / sum of X^2 - n(Xbar)^2
b1
Ybar - b1(Xbar)
standard error
square root of sum of (Y - Ybar)^2 / n - 2
R^2 must be between
0 and 1
SS(Total)
SS(regression) - SS(residual)
SS(total)
sum of (Y - Ybar) ^2
SS(regression)
sum of (Ybar - Yhat) ^2
SS(residual)
sum of (Y - Yhat)^2
R^2 formulas
SS(regression) / SS(total)
or
1 - SS(residual) / SS(total)
If the slope's p-value < 0.05, we
reject the null hypothesis and conclude the population correlation between X and Y is not equal to ) and so X and Y are really related
The slope tells
how much Y is predicted to increase for a one unite increase in X
elasticity
% change in Y / % change in X
for a regression line, the elasticity at the value of X is
b1 (X/Yhat)
The dependent variable is the variable that is being described, predicted, or controlled.
True
When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variabl
False
If r = -1, then we can conclude that there is a perfect relationship between X and Y.
True
The slope of the simple linear regression equation represents the average change in the value of the dependent variable per unit change in the independent variable (X).
True
The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line.
False
The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line
A. Never
B. Sometimes
C. Always
C. Always
In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the __________
A. Coefficient of determination
B. Y intercept of the regression line
C. Slope
C. Slope of the regression line
A simple regression analysis with 20 observations would yield ________ degrees of freedom.
A. 18
B. 20
C. 19
D. 1
A. 18
The following results were obtained as a part of simple regression analysis:
r2 = .9162
p-value = .000
The null hypothesis of no linear relationship between the dependent variable and the independent variable
A. is not an appropriate null hypothesis for t
B. Is rejected
The following results were obtained from a simple regression analysis:
y-hat = 37.2895 - (1.2024)X
r2 = .6744
What is the y-intercept of the linear regression equation?
A. .6774
B. .2934
C. -1.2024
D. 37.2895
D. 37.2895
The strength of the relationship between two quantitative variables can be measured by:
A. The Y intercept of the simple linear regression equatio
B. The slope of a simple linear regression equation
C. The coefficient of correlation
C. The coefficient of correlation
Regression analysis
r 0.873
r squared 0.762
Standard error 11.547
n 7
ANOVA
SS
Regression 2.133.3333
Residual 666.6667
Total 2,800,000
Regression output p-value
Intercept 63.3333 .0005
Advertising 6.667 .0103
The local grocery store wants to predict the d
A. 83,333
Be sure to use the number 3 instead of 3000.
The simple linear regression (least squares method) minimizes:
A. SS(x)
B. SSE
C. The explained variation
D. SS(y)
E. Total variation
B. SSE
If the __________ of the computed regression line is 0 then we can conclude that r = 0.
a. intercept
b. slope
c. p-value
b. slope
I want to use a regression line to predict the Sales (Y) of ice cream cones at the Baskin-Robbins in Galveston on Saturdays in summer. The X variable I will use will be the High Temperature for the day. In this way I can see what portion of the sales are
a. y(hat) +- t(sub6) (std error)
In this regression equation: y(hat) = 10 + 4x
suppose X increases from 25 to 28. How much will the predicted Y change?
a. 3
b. 1/3
c. 10
d. 12
e. 22
d. 12
If X= the month you were born in (1,...,12) and Y=your height (inches), what r would I get if I collected data and computed the correlation?
a. -1
b. exactly 0
c. very close to 0, but not necessarily exactly 0
d. 1
c. very close to 0, but not necessarily exactly 0
If the p-value (for the slope) on a regression printout = 0.00001 then....
a. p<0.05 so it looks like we have a good predictor of Y; at least we can safely conclude that it's not sampling error that is being shown in the correlation and the slope of the r
a. p<0.05 so it looks like we have a good predictor of Y; at least we can safely conclude that it's not sampling error that is being shown in the correlation and the slope of the regression line.
R-Square values can range from _____ to ______.
a. 0 1
b. -1 1
c. -1 0
d. -100 100
a. 0 1
What is the difference between (beta)1 and b1 ?
a. none; exactly the same; slope of regression line.
b. (beta)1 is the unknown population value, while b1 is its estimate from the data.
c. b1 is the unknown population value, while (beta)1 is its estimate f
b. (beta)1 is the unknown population value, while b1 is its estimate from the data.
Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8
b. 5.24%
Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8
a. -0.2288
Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8
c. 7.52
Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8
d. y(hat) +/- 4.933
In a regression problem, the slope = 0.40
The mean and standard deviation of the X variable are both 100.
The mean and standard deviation of the Y variable are both 200.
Find the correlation r.
a. -0.20
b. 0.20
c. 0.80
d. -0.80
e. 0.40
b. 0.20
A correlation of 0.02 would indicate:
a. a very strong direct relationship
b. a very weak direct relationship
c. a very strong inverse relationship
d. a very weak inverse relationship
e. a computational error had been made.
b. a very weak direct relationship
If all the points are on the regression line, then
a. the value of the slope is 0.
b. the value of the intercept is 0.
c. the correlation coefficient is 0.
d. the standard error is 0.
e. both (c) and (d)
d. the standard error is 0.
An Inverse Relationship means the trendline will have a _______ slope.
a. positive
b. negative
c. zero
b. negative
Compute the correlation r:
Given: (E)X=40, (E)Y=20, (E)XY=300, (E)X2=580, (E)Y2=400, n=10.
a. 0.0000
b. 0.2540
c. -0.5658
d. -0.2540
e. 0.5658
e. 0.5658
The correlation between X=person's weight and Y=person's height is 0.70 . What is the correlation for the same data set if we had used X=person's height and Y=person's weight?
a. 0.00
b. 0.70
c. -0.70
d. Need the original data so r can be recomputed
b. 0.70
Given the following information: r = 0.60
Mean Standard Deviation
X 40 4
Y 45 6
Find the intercept and slope of the regression line to predict Y from X.
a. intercept=9.0 slope=0.9
b. intercept=0.9 slope=9.0
c. intercept=9.0 slope=9.0
d. intercept=0.9 slop
a. intercept=9.0 slope=0.9
In a regression problem, n=52, SS(Total)=400, and r = -0.8367 Find the Standard Error.
a. 2.40
b. 0.30
c. 1.55
d. -1.55
e. -2.40
c. 1.55
The coefficient of correlation...
a. Has the same sign as the slope
b. Can range from -1.00 to 1.00
c. Is also called the percentage of variance accounted for.
d. (a) and (b) only
e. none of the above
d. (a) and (b) only
We obtained the following regression equation: y(hat) = 3.5 + 2.1x.. Which of the following statements are correct?
a. The dependent variable is predicted to increase by 2.1 for each increase of 1 unit in X.
b. The equation crosses the y-axis at 3.5.
c. I
e. (a), (b) and (d) only
The coefficient of correlation was computed to be -0.60. This means
a. the slope and intercept of the regression line are both negative
b. as x increases, y decreases.
c. x and y are both 0.
d. the percentage of variance accounted for equals sqrt(0.6)
b. as x increases, y decreases.
The covariance, denoted cov, is computed from:
cov = 1/(n-1) * [E](X - X(bar))(Y - Y(bar))
What is the formula for the correlation coefficient in terms of the cov?
a. cov / s(subx)
b. cov / (s(subx) * s(suby))
c. cov / s(subx)^2
d. cov / s(suby)^2
e. (s(s
b. cov / (s(subx) * s(suby))
In a regression problem, n=207 and SS(Residual)=0. Find the correlation coefficient.
a. 0.207
b. 0.00
c. 1.00
d. -1.00
e. either +/- 1.00
e. either +/- 1.00
A regression equation is used to predict women's Weight (in pounds) from their Height (in inches). The correlation between Weight (W) and Height (H) turned out to be 0.70 . The average H was 64 inches and the average W was 119.6 pounds.
The regression equ
b. 0.35
The standard error is
a. computed from squared deviations from the regression line.
b. may be negative
c. is given in squared units of the independent variable.
d. all of the above
a. computed from squared deviations from the regression line.
The percentage of variance accounted for...
a. is the square of the coefficient of correlation.
b. cannot be negative.
c. gives the percent of the variation in the dependent variable explained by the independent variable.
d. all of the above
b. cannot be negative.
Which of the following is not based on a correlation or a regression line relating y to x?
a. The standard error
b. The percentage of variance accounted for
c. SS(Total)
d. (a) and (c) only
e. (a), (b) and (c)
c. SS(Total)
The ratio of SS(Regression) divided by the SS(Total) is also called the
a. sum of squares due to regression
b. percentage of variance accounted for
c. standard error
d. coefficient of correlation.
b. percentage of variance accounted for
The primary purpose of a regression equation is to...
a. measure the association between two variables.
b. estimate the value of the dependent variable based on the independent variable.
c. estimate the value of the independent variable based on the depen
b. estimate the value of the dependent variable based on the independent variable.
Which of the following statements is not correct regarding the correlation.
a. It can range from -1 to 1.
b. Its square is the percentage of variance accounted for.
c. It measures the percent of variation explained.
d. It is a measure of the association b
c. It measures the percent of variation explained.
What does 1 - r^2 measure?
a. The relative importance of all other possible predictor variables on y.
b. The percentage of points that are on the regression line.
c. The percentage of points that are off the regression line
a. The relative importance of all other possible predictor variables on y.