Test 3

F test

In ANOVA, we perform an ________ to test the hypotheses.
H0 :?1 =?2 =...=?t
HA : at least one ?i differs from the others.

have strong evidence that at least one pair of means differs.

In ANOVA, If we reject H0, we _____________

Linear contrasts

_________ are a method of setting up tests to compare linear combinations of the ?i.

comparisons of multiple means at the same time.

A linear contrast is a linear combination of means that allows __________

The coefficients sum to zero...some have to be negative

In order to be considered a linear contrast, a combination of means must have _______

I error

When making multiple comparisons, the probability of type _____ can become very high.

Multiple comparisons

__________ refers to using the same data to perform multiple hypothesis tests or to create multiple confidence intervals.

Individual error rate

The _________ is the probability of making a type I error in a single test

Experimentwise error rate

The _____ is the probability of making at least one type I error in an analysis with m tests

Multiple comparison procedures

_________ are statistical methods that allow us to make several com- parisons while controlling the experimentwise error rate at a pre-specified value ?E.

The Bonferroni adjustment

This procedure is very conservative. For moderate values of m, ?m becomes very small, making it harder to reject H0 for each individual comparison. (Power is low using this method.)

Tukey's honest significant difference (HSD).

_________ Requires that all groups have the same sample size; i.e., n1 = n2 = . . . = nt.

Tukey's honest significant difference (HSD).

_________ is Used for inference on "all pairwise comparisons," the contrasts ?i ? ?j for every combination of i and j.

Dunnett's method

________: allows for testing/estimation of a subset of pairwise comparisons that can be seen as treatment vs. control comparisons. If ?1 is the mean response for the control group, this method allows confidence intervals for ?2 ? ?1, ?3 ? ?1, . . . , ?t ?

Scheffe's method

__________: can be used to test/estimate all linear contrasts. This procedure is more conservative than the others.

Scatterplot

_________: used to display the relationship between two quantitative variables

(1) Strength (2) Pattern (3) Direction

Three main features of a Scatterplot:

The Sample correlation (r):

___________: Describes the strength and direction of the linear relationship between X and Y.

-1, 1

Properties of r:
_____ ? r ? ______

positive linear relationship

Properties of r:
Positive r ? _________

negative linear relationship.

Properties of r:
negative r ? __________

strong

Properties of r:
Large |r| ? the relationship is...

weak

Properties of r:
small |r| ? relationship is...

Correlation is not causation

_________: r describes a tendency of variables to move together, but does not identify causal relationships.

outliers

r is sensitive to ________.

linear relationships

r is only accurate when measuring the strength of __________.

Linear regression

________ is a method for predicting one quantitative variable ( Y ) using another quantitative (X), assuming that there is a linear relationship between the two variables.

the expected value of the response variable when the explanatory variable is equal to zero.
Is the Y intercept

Regression coefficients:
?o = __________

?o

Regression coefficients:
____ = the expected value of the response variable when the explanatory variable is equal to zero.
Is the Y intercept

the expected increase in the response variable when the explanatory variable increases by 1 unit.
SLOPE

Regression coefficients:
?1 = _________

?1

Regression coefficients:
____ = the expected increase in the response variable when the explanatory variable increases by 1 unit.
SLOPE

no linear relationship between X and Y

Regression coefficients:
If ?1 = 0, there is ________

point prediction of the response variable

The equation for the regression line is
Yi = ?0 + ?1xi. Given the coefficients ?0 and ?1, this equation gives a ________ for a fixed value of xi.

observed y - predicted y

In regression, the residual is defined as ________

(1) The ?i are Normal.
(2) The ?i have equal variances.
(3) x and y have a linear relationship.

We can use residual analysis to assess the validity of the modeling assumptions that....

Normality

Create a Normal quantile plot of residuals to assess ____

variances and/or linearity does not hold.

Systematic patterns in the residuals as xi or ?i changes are indications that equality of __________

1. Residual by Predicted plot
2. Residual by X plot

Assess equality of variances and linearity with two residual plots:

Extrapolation

______ is Predicting Y for values of X beyond our collected explanatory variables. It is risky since we don't have information about the relationship between the explanatory and response variables there.

R^2

In regression, the coefficient of determination is ____

higher

The ______ R^2 is, the more accurately X can be used to predict Y

R^2

________ is the proportion of total variability in Y that our SLR model accounts for

total variability in Y

R^2 is the proportion of ________ that our SLR model accounts for

an unusual Y value compared to X

A Regression line Outlier has _______

an unusual x value

A Regression line Influential point has ______

outliers and influential points

Both ________ can affect the fit and estimates of the regression line.

If there is a non-linear relationship, transforming one or more variable might produce a linear
relationship.

What if x and y do not have a linear relationship?

partial slopes

Interpreting MLR coefficients:
The ?j coefficients can be viewed as _______.

increase when we increase xj by one unit

Interpreting MLR coefficients:
?j represents the amount we expect y to ___________, holding the other variables constant.

collinear

When x1 and x2 are highly correlated, they have similar relationships with y and we say that they are ________. It is difficult to use an MLR model to interpret the predictive value of the variables separately.

Large standard errors and the point estimates

For Collinearity in MLR, The coefficients associated with x1 and x2 might have __________ that might not make much sense.

interact
an interaction

In MLR, Two variables x1 and x2 are said to ________ if the effect of x2 differs for different values/levels of x2. For example, if the relationship between age and expenses is different between smokers and non-smokers, then age and smoker have _______.

proportion of variability

R^2 is the ________ in the response (Y) explained by the regression model

increases

R^2 always _______ when you add a new predictor

(1) 2 outcomes
(2) fixed probability of success
(3) n independent trials

What are the characteristics of a binomial experiment?

states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population

What is the central limit theorem?

binary data

Data from a binomial experiment, or _______, can appear in two equivalent formats: either a spreadsheet-style data table, or a frequency table that contains counts.

number of "successes

Binary data contain X, the ________ out of n trials.

probability of success for each trial.

For Inference for one population proportion:
The parameter of interest is ?, the ________

sampling distribution

a ______ is a probability distribution of an estimator (statistic).

Wald confidence interval

The _________ for ? uses the approximately Normal sampling distribution of ?^

Wald hypothesis test

For a _________, The p-value is the two tail (or left- or right-tail) probability of the calculated Z statistic,
found using the Normal (0,1) distribution.

n is small and/or ? is close to zero or one.

The Normal approximation of the sampling distribution of ?^ is less accurate when __________

Wilson score interval:

There are several alternatives to the Wald C.I. that can be shown to have coverage closer to the nominal (1 ? ?) ? 100%. One is the...

greater

In a Wald C.I., with a fixed margin of error, The closer ? is to 0.5, the______ sample size will be required. Therefore, when there is no prior information about ?, it is common practice to use ?? = 0.5 in sample size calculations to avoid underestimating

ratio of two proportions.

The relative risk is the _______

equal to or close to 1

When relative risk is ________, the two groups have the same (or nearly the same) probability of success

the same (or nearly the same) probability of success.

When the odds ratio is equal to or close to 1, the two groups have.....

(1) A fixed number (n) of independent trials.
(2) Each trial results in one of k outcomes.
(3) There is a fixed probability ?i of a single trial resulting in outcome i.
(4) The expected count for outcome is n?i.

The multinomial experiment satisfies the following conditions: (4)

multinomial experiments

These are Examples of _________:
1. Randomly select 100 graduating seniors for an exit survey. Record if they have a job in their field, a job outside of their field, plans to attend graduate school, or none of the above.
2. Randomly sample 50 of your cus

Chi-square test for association

If Ho is rejected in a _________, we conclude that there is some association between the two variables. We do not necessarily know whether the association is strong or weak.

at least 5

The chi-square test is most accurate when the expected count in each cell is _______