AP Statistics Chapter 3

response variable

measures an outcome of a study. independent variable.

explanatory variable

attempts to explain the observed outcomes. dependent variable.

how to examine data

plot the data. use numerical summaries. look for overall patterns and striking deviations (outliers). if overall pattern is regular, use a compact mathematical model to describe it.

scatterplot

shows the relationship between two quantitative variables measured on the same individuals. explanatory variable on x-axis. response variable on y-axis.

explanatory/response variables

change in x causes change in y. x used to predict the values of y.

how to make a scatterplot

look for an overall pattern and striking deviations (outliers). describe the form of the scatterplot. make axes and label.

how to describe a scatterplot

form is the pattern (linear or curved or clusters). direction is the association (positive or negative). strength is how closely the points follow a clear form such as a line (strong or moderately strong or weak).

outlier

an individual value that falls outside the overall pattern of the relationship.

positively associated

when above-average values of one tend to accompany above-average values of the other and below-average values also tend to occur together.

negatively associated

when above-average values of one tend to accompany below-average values of the other, and vice versa.

how to display categorical values in a scatterplot

use two different plotting symbols, such as colors, to differentiate the values.

correlation

measures the direction and strength of the linear relationship between two quantitative variables. numerical measure to supplement the graph, thus proving linear relationship. standardized, no units. r.

r

1) positive=positive association between variables. negative=negative association between variables.
2) makes no distinction between explanatory and response variables. x or y does not matter.
3) requires that both variables be quantitative.
4) always bet

r=0

no linear relationship. scattered.

r=.99

strong, positive linear relationship.

r=-.99

strong, negative linear relationship.

how to use correlation

correlation is not a complete description of two variable data, even when the relationship is linear. give the means and SDs of both x and y along with the correlation. conclusions based on correlation. describe data more.

r=1, r=-1

points lie exactly on a straight line.

least-squares regression

a straight line that describes how a response variable y changes as an explanatory variable x changes. often used to predict the value of y for a given value x. unlike correlation, requires an explanatory variable and a response variable.

least-squares regression line

the line that makes the vertical distances of the points in a scatterplot from the line as small as possible.

LSRL

?=a + bx

?

predicted value.

y

observed value.

r�

_____% of the variation in the response variable (y) is accounted for by the regression line. a measure of how successful the regression was in explaining the response.

correlation and slope of LSRL

a change of one standard deviation in x corresponds to a change of r standard deviations in y.

residual

the difference between an observed value of the response variable and the value predicted by the regression line. y - ?. the mean of the least-squares residuals of a LSLR is always zero. otherwise, caused by a roundoff error.

residual plot

a scatterplot of the regression residuals against the explanatory variable. help us assess the fit of a regression line.

how to make a residual plot

plot the x values on the x-axis and the residuals on the y-axis. draw a line at zero. label the axes.

how to examine a residual plot

1) a curved pattern shows the relationship is not linear. thus, a straight line is an inappropriate model.
2) increasing or decreasing spread about the line shows that prediction of y will be less accurate for larger x.
3) individual points with large res

outlier

observation that lies outside the overall pattern of the other observations.

influential observation

removing the observation would markedly change the result of the calculation. points that are outliers in the x direction of a scatterplot are often influential observations for the LSRL. has small residuals because it pulls the regression line toward its

how to analyze data for two variables

1) plot your data in a scatterplot.
2) interpret what you see: direction, form strength. linear?
3) numerical summary? x bar, y bar, SD x, SD y and r?
4) mathematic model? regression line?