SCMT 303 Darcey Final Exam #4, 303 final review

The best line has the least amount of

squared residuals

R

correlation coefficient

What is the correlation coefficient?

a number that gives the strength and direction of the linear regression equation

T or F: it makes no difference which variable you call Y and which variable you call X

true

T or F: both variables must be numerical

true

T or F: R will change is you change the units of X or Y

false, R has to units so it will not change

Positive R indicates

a positive association (slope) between the variables

Negative R indicates

a negative association (slope) between the variables

R is always between

-1 and 1

T or F: correlation only works for non-linear relationships

false, it only works for linear relationships. if the relationship is non-linear, R will give a non-valid result

T or F: R is affected by outliers

true

Slope

R(Sy/Sx)

Univariate data

one variable

Bivariate

two variables

Direct (or increasing) relationship

an increase in one variable is associated with an increase in the other variable
R > 0

Inverse (or decreasing) relationship

an increase in one variable is associated with a decrease in the other variable
R < 0

T or F: correlation means causality

false, correlation does NOT mean causality

if R is near 0

weak assiciation

if R is near positive or negative 1

strong association

Linear Regression Model

Y = bo + b1X

bo

sum of XY - n (Xbar)(Ybar) / sum of X^2 - n(Xbar)^2

b1

Ybar - b1(Xbar)

standard error

square root of sum of (Y - Ybar)^2 / n - 2

R^2 must be between

0 and 1

SS(Total)

SS(regression) - SS(residual)

SS(total)

sum of (Y - Ybar) ^2

SS(regression)

sum of (Ybar - Yhat) ^2

SS(residual)

sum of (Y - Yhat)^2

R^2 formulas

SS(regression) / SS(total)
or
1 - SS(residual) / SS(total)

If the slope's p-value < 0.05, we

reject the null hypothesis and conclude the population correlation between X and Y is not equal to ) and so X and Y are really related

The slope tells

how much Y is predicted to increase for a one unite increase in X

elasticity

% change in Y / % change in X

for a regression line, the elasticity at the value of X is

b1 (X/Yhat)

The dependent variable is the variable that is being described, predicted, or controlled.

True

When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variabl

False

If r = -1, then we can conclude that there is a perfect relationship between X and Y.

True

The slope of the simple linear regression equation represents the average change in the value of the dependent variable per unit change in the independent variable (X).

True

The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line.

False

The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line
A. Never
B. Sometimes
C. Always

C. Always

In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the __________
A. Coefficient of determination
B. Y intercept of the regression line
C. Slope

C. Slope of the regression line

A simple regression analysis with 20 observations would yield ________ degrees of freedom.
A. 18
B. 20
C. 19
D. 1

A. 18

The following results were obtained as a part of simple regression analysis:
r2 = .9162
p-value = .000
The null hypothesis of no linear relationship between the dependent variable and the independent variable
A. is not an appropriate null hypothesis for t

B. Is rejected

The following results were obtained from a simple regression analysis:
y-hat = 37.2895 - (1.2024)X
r2 = .6744
What is the y-intercept of the linear regression equation?
A. .6774
B. .2934
C. -1.2024
D. 37.2895

D. 37.2895

The strength of the relationship between two quantitative variables can be measured by:
A. The Y intercept of the simple linear regression equatio
B. The slope of a simple linear regression equation
C. The coefficient of correlation

C. The coefficient of correlation

Regression analysis
r 0.873
r squared 0.762
Standard error 11.547
n 7
ANOVA
SS
Regression 2.133.3333
Residual 666.6667
Total 2,800,000
Regression output p-value
Intercept 63.3333 .0005
Advertising 6.667 .0103
The local grocery store wants to predict the d

A. 83,333
Be sure to use the number 3 instead of 3000.

The simple linear regression (least squares method) minimizes:
A. SS(x)
B. SSE
C. The explained variation
D. SS(y)
E. Total variation

B. SSE

If the __________ of the computed regression line is 0 then we can conclude that r = 0.
a. intercept
b. slope
c. p-value

b. slope

I want to use a regression line to predict the Sales (Y) of ice cream cones at the Baskin-Robbins in Galveston on Saturdays in summer. The X variable I will use will be the High Temperature for the day. In this way I can see what portion of the sales are

a. y(hat) +- t(sub6) (std error)

In this regression equation: y(hat) = 10 + 4x
suppose X increases from 25 to 28. How much will the predicted Y change?
a. 3
b. 1/3
c. 10
d. 12
e. 22

d. 12

If X= the month you were born in (1,...,12) and Y=your height (inches), what r would I get if I collected data and computed the correlation?
a. -1
b. exactly 0
c. very close to 0, but not necessarily exactly 0
d. 1

c. very close to 0, but not necessarily exactly 0

If the p-value (for the slope) on a regression printout = 0.00001 then....
a. p<0.05 so it looks like we have a good predictor of Y; at least we can safely conclude that it's not sampling error that is being shown in the correlation and the slope of the r

a. p<0.05 so it looks like we have a good predictor of Y; at least we can safely conclude that it's not sampling error that is being shown in the correlation and the slope of the regression line.

R-Square values can range from _____ to ______.
a. 0 1
b. -1 1
c. -1 0
d. -100 100

a. 0 1

What is the difference between (beta)1 and b1 ?
a. none; exactly the same; slope of regression line.
b. (beta)1 is the unknown population value, while b1 is its estimate from the data.
c. b1 is the unknown population value, while (beta)1 is its estimate f

b. (beta)1 is the unknown population value, while b1 is its estimate from the data.

Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8

b. 5.24%

Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8

a. -0.2288

Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8

c. 7.52

Here is an Excel printout of a regression problem. Use this for the following 4 questions.
Regression Statistics
Multiple R 0.2288
R Square 0.0524
Adjusted R Square 0.0415
Standard Error 2.5166
Observations 89
ANOVA
df SS MS F p
Regress. 1 30.45 30.45 4.8

d. y(hat) +/- 4.933

In a regression problem, the slope = 0.40
The mean and standard deviation of the X variable are both 100.
The mean and standard deviation of the Y variable are both 200.
Find the correlation r.
a. -0.20
b. 0.20
c. 0.80
d. -0.80
e. 0.40

b. 0.20

A correlation of 0.02 would indicate:
a. a very strong direct relationship
b. a very weak direct relationship
c. a very strong inverse relationship
d. a very weak inverse relationship
e. a computational error had been made.

b. a very weak direct relationship

If all the points are on the regression line, then
a. the value of the slope is 0.
b. the value of the intercept is 0.
c. the correlation coefficient is 0.
d. the standard error is 0.
e. both (c) and (d)

d. the standard error is 0.

An Inverse Relationship means the trendline will have a _______ slope.
a. positive
b. negative
c. zero

b. negative

Compute the correlation r:
Given: (E)X=40, (E)Y=20, (E)XY=300, (E)X2=580, (E)Y2=400, n=10.
a. 0.0000
b. 0.2540
c. -0.5658
d. -0.2540
e. 0.5658

e. 0.5658

The correlation between X=person's weight and Y=person's height is 0.70 . What is the correlation for the same data set if we had used X=person's height and Y=person's weight?
a. 0.00
b. 0.70
c. -0.70
d. Need the original data so r can be recomputed

b. 0.70

Given the following information: r = 0.60
Mean Standard Deviation
X 40 4
Y 45 6
Find the intercept and slope of the regression line to predict Y from X.
a. intercept=9.0 slope=0.9
b. intercept=0.9 slope=9.0
c. intercept=9.0 slope=9.0
d. intercept=0.9 slop

a. intercept=9.0 slope=0.9

In a regression problem, n=52, SS(Total)=400, and r = -0.8367 Find the Standard Error.
a. 2.40
b. 0.30
c. 1.55
d. -1.55
e. -2.40

c. 1.55

The coefficient of correlation...
a. Has the same sign as the slope
b. Can range from -1.00 to 1.00
c. Is also called the percentage of variance accounted for.
d. (a) and (b) only
e. none of the above

d. (a) and (b) only

We obtained the following regression equation: y(hat) = 3.5 + 2.1x.. Which of the following statements are correct?
a. The dependent variable is predicted to increase by 2.1 for each increase of 1 unit in X.
b. The equation crosses the y-axis at 3.5.
c. I

e. (a), (b) and (d) only

The coefficient of correlation was computed to be -0.60. This means
a. the slope and intercept of the regression line are both negative
b. as x increases, y decreases.
c. x and y are both 0.
d. the percentage of variance accounted for equals sqrt(0.6)

b. as x increases, y decreases.

The covariance, denoted cov, is computed from:
cov = 1/(n-1) * [E](X - X(bar))(Y - Y(bar))
What is the formula for the correlation coefficient in terms of the cov?
a. cov / s(subx)
b. cov / (s(subx) * s(suby))
c. cov / s(subx)^2
d. cov / s(suby)^2
e. (s(s

b. cov / (s(subx) * s(suby))

In a regression problem, n=207 and SS(Residual)=0. Find the correlation coefficient.
a. 0.207
b. 0.00
c. 1.00
d. -1.00
e. either +/- 1.00

e. either +/- 1.00

A regression equation is used to predict women's Weight (in pounds) from their Height (in inches). The correlation between Weight (W) and Height (H) turned out to be 0.70 . The average H was 64 inches and the average W was 119.6 pounds.
The regression equ

b. 0.35

The standard error is
a. computed from squared deviations from the regression line.
b. may be negative
c. is given in squared units of the independent variable.
d. all of the above

a. computed from squared deviations from the regression line.

The percentage of variance accounted for...
a. is the square of the coefficient of correlation.
b. cannot be negative.
c. gives the percent of the variation in the dependent variable explained by the independent variable.
d. all of the above

b. cannot be negative.

Which of the following is not based on a correlation or a regression line relating y to x?
a. The standard error
b. The percentage of variance accounted for
c. SS(Total)
d. (a) and (c) only
e. (a), (b) and (c)

c. SS(Total)

The ratio of SS(Regression) divided by the SS(Total) is also called the
a. sum of squares due to regression
b. percentage of variance accounted for
c. standard error
d. coefficient of correlation.

b. percentage of variance accounted for

The primary purpose of a regression equation is to...
a. measure the association between two variables.
b. estimate the value of the dependent variable based on the independent variable.
c. estimate the value of the independent variable based on the depen

b. estimate the value of the dependent variable based on the independent variable.

Which of the following statements is not correct regarding the correlation.
a. It can range from -1 to 1.
b. Its square is the percentage of variance accounted for.
c. It measures the percent of variation explained.
d. It is a measure of the association b

c. It measures the percent of variation explained.

What does 1 - r^2 measure?
a. The relative importance of all other possible predictor variables on y.
b. The percentage of points that are on the regression line.
c. The percentage of points that are off the regression line

a. The relative importance of all other possible predictor variables on y.