Bivariate Data
Consists of the values of two different variables
Three Comobinations of variable types
Both categorical
one categorical and one quantitative
Both quantitative
scatterplots
show the relationship between two quantitative variables measure on the same individuals.
A point represents the data that has been observed for both
variables
Interpreting Scatterplots
1. Look for the overall pattern and any striking
deviations from that pattern
2.Describe the overall pattern of the relationship by form( linear, curve, scattered,etc.), direction ( positive or negative), strength ( weak, moderate, strong)
3.An outlier is
Patterns of Association
1.Two variables are positively associated when larger values of one variable tend to be accompanied by larger values of the other (scatterplot slopes upward
as we move from left to right)
2. Two variables are negatively associated when larger values of on
When describing scatterplots that take a linear (straight-line) form, we can use a
numerical measure to describe the relationship
correlation coefficient
describes the direction and strength of the linear relationship between two quantitative variables.
Interpretation of Correlation
Coefficient
r
If r is negative
then the association between the two variables is negative
If r is positive
then the association between the two variables is positive
The value of r is always between
-1 and +1 (inclusive)
r = 1 or r = -1 occurs only when the points in a scatterplot lie exactly
along a straight line
The closer to -1 and +1
the stronger the relationship
The closer to 0
the weaker the relationship
Correlation measures the strength
only linear association between two
variables
The correlation does not change when
the units of measurement change.
The correlation is strongly influenced by outlier
Use r with caution when outliers appear in the scatterplot.
If two variables have a strong linear relationship, then we can use a
linear regression line to predict values of
y (response) from x (explanatory).
regression line
is a straight line that describes how
a response variable y changes as an explanatory variable x changes
The least-squares regression line
of y on x is the line that makes the sum of the squared vertical distances of the data points from the line as small as possible.
The equation of a line
y = a + bx
b:
the slope of the line
-The amount of change in y when x increases by 1 unit
a:
the intercept
-The value of y when x = 0
To Use Linear Regression
1. Make a scatterplot to determine if a linear relationship is reasonable.
2. Fit a straight line with the least deviation (use computer program).
3. Predict values of y given values of x by substituting the value of x into the equation and solving for th
square of the correlation, r^2
is the proportion of the variation in the values of y that is explained by the least-squares regression of y on x
r2 is always between
0 and 1
-The closer r2 is to 1, the more
confident we are in our prediction
If r=+ or - 0.7 , then r^2 =
0.49 (1/2 of the variation is accounted for
by the straight-line relationship with the other variable).
A strong relationship between two variables does not always mean that
changes in one variable cause change in the other.
The relationship between two variables is often influenced by
The relationship between two variables is often influenced by
The best evidence for causation comes from
properly designed randomized
comparative experiments.
An observed relationship can be used for
prediction without worry about
causation as long as past patterns continue.
The observed relationship between two variables may be due to
causation, common response, or confounding
-All or some may be present together
Evidence for Causation
The case for the claim that variable x causes changes in variable y is strengthened if
- The association between x and y is strong.
- The association is consistent.
- Higher doses are associated with stronger responses.
- The alleged cause precedes the ef