Stats Midterm 2

Least Squares Regression Line

The most common method of fitting a line to a scatterplot is least squares. The least-squares regression line is the straight line ? = a + bx that minimizes the sum of the squares of the vertical distances of the observed points from the line.

positive association

Two variables are positively associated when above-average values of one tend to accompany above-average values of the other, and below-average values also tend to occur together.

Residual

A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, a residual is the prediction error that remains after we have chosen the regression line: residual = observed y (y) -

Scatterplot

A scatterplot displays the relationship between two quantitative variables measured on the same individuals. Mark values of one variable on the horizontal axis (x axis) and values of the other variable on the vertical axis (y axis). Plot each individual's

Slope

The slope b of a regression line ? = a + bx is the rate at which the predicted response ? changes along the line as the explanatory variable x changes. Specifically, b is the change in ? when x increases by 1.

True or false: Whenever the correlation between x and y is zero, the slope of the least squares regression line is also zero.

False.

True or false: Correlation tells us the average increase in y for every one unit increase in x.

False. Slope tells us the average increase in y for every one unit increase in x, not correlation.

True or false: The z-score products used to compute the correlation coefficient are based on deviations from the point: (x hat, y hat)

True.

What does the sum of squared residuals measure?

Unexplained variation (The variability of the y's about the regression line.)

True or false: "Total variation in y" = "Variation explained by x" + "Variation unexplained by x

True. This fact that the "Total variation in the y values" can be partitioned into "Variation explained by x" and Variation unexplained by x" is the basis of r^2.

A residual plot is really a scatterplot. So what does each data point in a residual plot represent?

A y value and the corresponding ? value. Since a residual plot is really a scatterplot, the x values are plotted versus the residuals (or equivalently, the x values versus the ?'s (i.e., predicted y's). Thus, each dot gives an x value and the correspondin

What is the purpose of a residual plot?

To compare explained variation with unexplained variation. A residual plot acts as a magnifying glass for identifying problems with the conditions required for obtaining valid models in regression.

Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to predict with certainty whether you will win the game on the next try?

No. When a random phenomenon occurs only once, you cannot predict the outcome. It wouldn't be a random phenomenon if we could predict the outcome. Since playing the game is a random phenomenon, we cannot predict with certainty the outcome.

In statistics, how do we define probability of an outcome?

Probability equals the fraction of times the outcome occurs in many, many trials of a random phenomenon.

What does the probability distribution of a random variable give us?

All possible values of the random variable and their probabilities. A distribution is defined as a list of possible values of a variable together with how often each value occurs. This definition is simply modified to define "probability distribution''. A

Why do we simulate many, many SRS's and obtain the value of (x bar) from each sample?

To answer questions about the (x bar)'s from these SRS's. Since statistical inference is using the value of a statistic to estimate the value of a parameter, we need to simulate the value of a statistic using many, many SRS's. These simulations will give

True or False: The Law of Large Numbers says that the sample mean, (x bar), equals the population mean, ?, for large random samples.

False. The Law of Large Numbers says that the sample mean, (x bar), gets closer and closer to the population mean, ?, as sample size increases--provided that the sample is random.

Voter registration records show that 68% of all voters in Indianapolis are registered as Republicans. To test a random digit dialing device, you use the device to call 150 randomly chosen residential telephones in Indianapolis. Of the registered voters co

68% is a parameter and 73% is a statistic. Since 68% is the percent of "all" voters in Indianapolis, it is a parameter. And since 73% is the percent of registered voters in the 150 randomly chosen residential telephones, it is a statistic.

What is the purpose of obtaining a statistic value from sample data?

To estimate a parameter. In statistical inference, we use the value of a statistic to estimate the value of a parameter.

Fill in the blank: The sampling distribution of (x bar) gives ____ from all possible samples of the same size from the same population.

All possible (x bar)-values. Each sample yields its own value for (x bar). The sampling distribution of (x bar)consists of the collection of all these possible (x bar)-values

Which one of the following measures the variability of a statistic?
A. the standard deviation of the data.
B. the standard deviation of the sampling distribution for the statistic.
C. the total sum of squares of deviations of the observations about the me

B. the standard deviation of the sampling distribution for the statistic.

A theoretical sampling distribution of a statistic consists of:
A. the results of a sample.
B. the values of a statistic from all possible samples.
C. the range of the values in a sample.
D. a set of sample data that has the shape as the original populati

B. the values of a statistic from all possible samples.

Correlation

The correlation r measures the direction and strength of the linear association between two quantitative variables x and y. Although you can calculate a correlation for any scatterplot, r measures only straight-line relationships. Correlation indicates th

Regression Line

A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.