F test
In ANOVA, we perform an ________ to test the hypotheses.
H0 :?1 =?2 =...=?t
HA : at least one ?i differs from the others.
have strong evidence that at least one pair of means differs.
In ANOVA, If we reject H0, we _____________
Linear contrasts
_________ are a method of setting up tests to compare linear combinations of the ?i.
comparisons of multiple means at the same time.
A linear contrast is a linear combination of means that allows __________
The coefficients sum to zero...some have to be negative
In order to be considered a linear contrast, a combination of means must have _______
I error
When making multiple comparisons, the probability of type _____ can become very high.
Multiple comparisons
__________ refers to using the same data to perform multiple hypothesis tests or to create multiple confidence intervals.
Individual error rate
The _________ is the probability of making a type I error in a single test
Experimentwise error rate
The _____ is the probability of making at least one type I error in an analysis with m tests
Multiple comparison procedures
_________ are statistical methods that allow us to make several com- parisons while controlling the experimentwise error rate at a pre-specified value ?E.
The Bonferroni adjustment
This procedure is very conservative. For moderate values of m, ?m becomes very small, making it harder to reject H0 for each individual comparison. (Power is low using this method.)
Tukey's honest significant difference (HSD).
_________ Requires that all groups have the same sample size; i.e., n1 = n2 = . . . = nt.
Tukey's honest significant difference (HSD).
_________ is Used for inference on "all pairwise comparisons," the contrasts ?i ? ?j for every combination of i and j.
Dunnett's method
________: allows for testing/estimation of a subset of pairwise comparisons that can be seen as treatment vs. control comparisons. If ?1 is the mean response for the control group, this method allows confidence intervals for ?2 ? ?1, ?3 ? ?1, . . . , ?t ?
Scheffe's method
__________: can be used to test/estimate all linear contrasts. This procedure is more conservative than the others.
Scatterplot
_________: used to display the relationship between two quantitative variables
(1) Strength (2) Pattern (3) Direction
Three main features of a Scatterplot:
The Sample correlation (r):
___________: Describes the strength and direction of the linear relationship between X and Y.
-1, 1
Properties of r:
_____ ? r ? ______
positive linear relationship
Properties of r:
Positive r ? _________
negative linear relationship.
Properties of r:
negative r ? __________
strong
Properties of r:
Large |r| ? the relationship is...
weak
Properties of r:
small |r| ? relationship is...
Correlation is not causation
_________: r describes a tendency of variables to move together, but does not identify causal relationships.
outliers
r is sensitive to ________.
linear relationships
r is only accurate when measuring the strength of __________.
Linear regression
________ is a method for predicting one quantitative variable ( Y ) using another quantitative (X), assuming that there is a linear relationship between the two variables.
the expected value of the response variable when the explanatory variable is equal to zero.
Is the Y intercept
Regression coefficients:
?o = __________
?o
Regression coefficients:
____ = the expected value of the response variable when the explanatory variable is equal to zero.
Is the Y intercept
the expected increase in the response variable when the explanatory variable increases by 1 unit.
SLOPE
Regression coefficients:
?1 = _________
?1
Regression coefficients:
____ = the expected increase in the response variable when the explanatory variable increases by 1 unit.
SLOPE
no linear relationship between X and Y
Regression coefficients:
If ?1 = 0, there is ________
point prediction of the response variable
The equation for the regression line is
Yi = ?0 + ?1xi. Given the coefficients ?0 and ?1, this equation gives a ________ for a fixed value of xi.
observed y - predicted y
In regression, the residual is defined as ________
(1) The ?i are Normal.
(2) The ?i have equal variances.
(3) x and y have a linear relationship.
We can use residual analysis to assess the validity of the modeling assumptions that....
Normality
Create a Normal quantile plot of residuals to assess ____
variances and/or linearity does not hold.
Systematic patterns in the residuals as xi or ?i changes are indications that equality of __________
1. Residual by Predicted plot
2. Residual by X plot
Assess equality of variances and linearity with two residual plots:
Extrapolation
______ is Predicting Y for values of X beyond our collected explanatory variables. It is risky since we don't have information about the relationship between the explanatory and response variables there.
R^2
In regression, the coefficient of determination is ____
higher
The ______ R^2 is, the more accurately X can be used to predict Y
R^2
________ is the proportion of total variability in Y that our SLR model accounts for
total variability in Y
R^2 is the proportion of ________ that our SLR model accounts for
an unusual Y value compared to X
A Regression line Outlier has _______
an unusual x value
A Regression line Influential point has ______
outliers and influential points
Both ________ can affect the fit and estimates of the regression line.
If there is a non-linear relationship, transforming one or more variable might produce a linear
relationship.
What if x and y do not have a linear relationship?
partial slopes
Interpreting MLR coefficients:
The ?j coefficients can be viewed as _______.
increase when we increase xj by one unit
Interpreting MLR coefficients:
?j represents the amount we expect y to ___________, holding the other variables constant.
collinear
When x1 and x2 are highly correlated, they have similar relationships with y and we say that they are ________. It is difficult to use an MLR model to interpret the predictive value of the variables separately.
Large standard errors and the point estimates
For Collinearity in MLR, The coefficients associated with x1 and x2 might have __________ that might not make much sense.
interact
an interaction
In MLR, Two variables x1 and x2 are said to ________ if the effect of x2 differs for different values/levels of x2. For example, if the relationship between age and expenses is different between smokers and non-smokers, then age and smoker have _______.
proportion of variability
R^2 is the ________ in the response (Y) explained by the regression model
increases
R^2 always _______ when you add a new predictor
(1) 2 outcomes
(2) fixed probability of success
(3) n independent trials
What are the characteristics of a binomial experiment?
states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population
What is the central limit theorem?
binary data
Data from a binomial experiment, or _______, can appear in two equivalent formats: either a spreadsheet-style data table, or a frequency table that contains counts.
number of "successes
Binary data contain X, the ________ out of n trials.
probability of success for each trial.
For Inference for one population proportion:
The parameter of interest is ?, the ________
sampling distribution
a ______ is a probability distribution of an estimator (statistic).
Wald confidence interval
The _________ for ? uses the approximately Normal sampling distribution of ?^
Wald hypothesis test
For a _________, The p-value is the two tail (or left- or right-tail) probability of the calculated Z statistic,
found using the Normal (0,1) distribution.
n is small and/or ? is close to zero or one.
The Normal approximation of the sampling distribution of ?^ is less accurate when __________
Wilson score interval:
There are several alternatives to the Wald C.I. that can be shown to have coverage closer to the nominal (1 ? ?) ? 100%. One is the...
greater
In a Wald C.I., with a fixed margin of error, The closer ? is to 0.5, the______ sample size will be required. Therefore, when there is no prior information about ?, it is common practice to use ?? = 0.5 in sample size calculations to avoid underestimating
ratio of two proportions.
The relative risk is the _______
equal to or close to 1
When relative risk is ________, the two groups have the same (or nearly the same) probability of success
the same (or nearly the same) probability of success.
When the odds ratio is equal to or close to 1, the two groups have.....
(1) A fixed number (n) of independent trials.
(2) Each trial results in one of k outcomes.
(3) There is a fixed probability ?i of a single trial resulting in outcome i.
(4) The expected count for outcome is n?i.
The multinomial experiment satisfies the following conditions: (4)
multinomial experiments
These are Examples of _________:
1. Randomly select 100 graduating seniors for an exit survey. Record if they have a job in their field, a job outside of their field, plans to attend graduate school, or none of the above.
2. Randomly sample 50 of your cus
Chi-square test for association
If Ho is rejected in a _________, we conclude that there is some association between the two variables. We do not necessarily know whether the association is strong or weak.
at least 5
The chi-square test is most accurate when the expected count in each cell is _______