Stat ch. 14-15

Bivariate Data

Consists of the values of two different variables

Three Comobinations of variable types

Both categorical
one categorical and one quantitative
Both quantitative

scatterplots

show the relationship between two quantitative variables measure on the same individuals.
A point represents the data that has been observed for both
variables

Interpreting Scatterplots

1. Look for the overall pattern and any striking
deviations from that pattern
2.Describe the overall pattern of the relationship by form( linear, curve, scattered,etc.), direction ( positive or negative), strength ( weak, moderate, strong)
3.An outlier is

Patterns of Association

1.Two variables are positively associated when larger values of one variable tend to be accompanied by larger values of the other (scatterplot slopes upward
as we move from left to right)
2. Two variables are negatively associated when larger values of on

When describing scatterplots that take a linear (straight-line) form, we can use a

numerical measure to describe the relationship

correlation coefficient

describes the direction and strength of the linear relationship between two quantitative variables.

Interpretation of Correlation
Coefficient

r

If r is negative

then the association between the two variables is negative

If r is positive

then the association between the two variables is positive

The value of r is always between

-1 and +1 (inclusive)

r = 1 or r = -1 occurs only when the points in a scatterplot lie exactly

along a straight line

The closer to -1 and +1

the stronger the relationship

The closer to 0

the weaker the relationship

Correlation measures the strength

only linear association between two
variables

The correlation does not change when

the units of measurement change.

The correlation is strongly influenced by outlier

Use r with caution when outliers appear in the scatterplot.

If two variables have a strong linear relationship, then we can use a

linear regression line to predict values of
y (response) from x (explanatory).

regression line

is a straight line that describes how
a response variable y changes as an explanatory variable x changes

The least-squares regression line

of y on x is the line that makes the sum of the squared vertical distances of the data points from the line as small as possible.

The equation of a line

y = a + bx

b:

the slope of the line
-The amount of change in y when x increases by 1 unit

a:

the intercept
-The value of y when x = 0

To Use Linear Regression

1. Make a scatterplot to determine if a linear relationship is reasonable.
2. Fit a straight line with the least deviation (use computer program).
3. Predict values of y given values of x by substituting the value of x into the equation and solving for th

square of the correlation, r^2

is the proportion of the variation in the values of y that is explained by the least-squares regression of y on x

r2 is always between

0 and 1
-The closer r2 is to 1, the more
confident we are in our prediction

If r=+ or - 0.7 , then r^2 =

0.49 (1/2 of the variation is accounted for
by the straight-line relationship with the other variable).

A strong relationship between two variables does not always mean that

changes in one variable cause change in the other.

The relationship between two variables is often influenced by

The relationship between two variables is often influenced by

The best evidence for causation comes from

properly designed randomized
comparative experiments.

An observed relationship can be used for

prediction without worry about
causation as long as past patterns continue.

The observed relationship between two variables may be due to

causation, common response, or confounding
-All or some may be present together

Evidence for Causation

The case for the claim that variable x causes changes in variable y is strengthened if
- The association between x and y is strong.
- The association is consistent.
- Higher doses are associated with stronger responses.
- The alleged cause precedes the ef