correlation coefficient
A measure of the strength of the linear relationship between two variables.
bivariate quantitative data
The type of data required for regression analysis.
explanatory variable
A variable that gives the value (may not be a number) of the outcome of a study
two variables - quantitative and their relationship must be linear
The two requirements for computing correlation coefficient.
scatterplot
A two dimensional plot used to examine strength of relationship between two variables as well as direction
and type of relationship.
outlier
An observation that substantially alters the correlation coefficient value.
positive association
Type of association where high values of one variable tend to associate with high values of another variable.
plus one and minus one
The maximum and minimum possible values of correlation coefficient.
no unit of measure
The unit of measure for the correlation coefficient.
zero; non-linear relationship and no relationship can have an r of zero
The value of the correlation coefficient when there is no linear association between two quantitative variables.
conditional distributions
If you want to examine the relationship (if any) between two categorical variables, look at the
marginal distribution
one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.
conditional distribution
the distribution of values of that variable among only individuals who have a given value of the other variable. There is a separate conditional distribution for each value of the other variable
simpson's paradox
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group
marginal distributions
In a two-way table, the row totals and column totals give the marginal distributions of the two individual variables. It is clearer to present these distributions as percents of the table total. _______ ________ tell us nothing about the relationship betw
conditional distribution
To find the ______ _______ of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.
bar graphs
______ _______ are a flexible means of presenting categorical data. There is no single best way to describe an association between two categorical variables.
lurking variables
Simpson's paradox is an example of the effect of ________ _______ on an observed association
true
True or false: Slope, b, and correlation coefficient, r, have the same sign (positive or negative).
true
True or false: The least squares regression line always intersects the point: x bar, y bar.
correlation
Which of the following allows you to interchange the roles of x and y? correlation or regression?
y hat equals A plus BX
the statistical model for a straight line
true
True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).
explained variation; the variability of the predicted y's
what does the sum of squared residuals measure?
true
True or false: "Total variation in y" measures the variability of the y's about their mean y bar.
r squared
the percentage of total variation in y that can be explained by the regression by x". When a question asks about "the percentage of total variation in y that can be explained by x", it's asking for ____.
True, because if X tells us nothing about the value of Y, then none of the variation in Y is explained by X so r squared = zero %.
True or false: If the value of X tells us nothing about the value of Y, then r2 = 0%.
influential observation
a data value whose removal from the data set results in drastically changed estimates for slope and/or y-intercept.
True. An outlier in the x direction affects the mean and standard deviation of the x's. An outlier in the y direction affects the mean and standard deviation of the y's. These means and standard deviations are used in the computation of the correlation an
True or false: An outlier in either the x or y direction affects the value of the correlation, r.
False. High correlation does not imply causation unless the data are from an experiment.
True or false: If the correlation between x and y is r = 0.99, we can say that changes in x cause changes in y.
correlation; the purpose of correlation is to measure the strength of linear relationships
If you want to measure the strength of the linear relationship between two quantitative variables, which of the following should you use?
r squared
we interpret ______ as the percentage of total variation in our y variable that can be explained by our x variable
True. A lurking variable is a variable not included in a study but that might explain the relationship between the two variables that you did. For example, length of feet and scores on a national reading exam are highly correlated for children ages 6 to 1
True or false: Two quantitative variables, x and y, may be strongly correlated because both are consequences of a third lurking variable
In a two-way table. Since counting is the only appropriate operation for categorical data, these counts need to be displayed. We display them with two way tables
How do we display a set of bivariate categorical data?
Counting. Since categorical data are words not numbers that have value, mathematical operations such as adding, subtracting, and multiplying do not make sense. Only counting is appropriate for categorical data. That is why it is sometimes called "count" d
what operation can be performed on categorical data?
The totals for each row and each column. The margins show the totals in each row and each column. These totals are sometimes called marginals because they are in the margins of the table.
In a two-way table, what is given in the "margins"?
An association exists if the conditional distributions DO NOT equal the marginal distribution.
Association
True. Often the row variable is the explanatory variable and the response variable is the column variable.
True or false: rows and columns are used to display the explanatory and response variables for bivariate categorical data.
random variable
a variable whose value is a numerical outcome of a random phenomenon.
what values x can take and and how to assign probabilities to those values.
probability distribution of a random variable x tells us....
probability model
shows all the possible outcomes of a random phenomenon and the probability of each outcome occurring
random phenomenon
an event whose outcome we cannot know in advance with complete certainty but that has a regular pattern of occurrences over many repetitions
probability
the proportion of times it would occur in a long series of repetitions or trials
one, zero
any event, like a coin toss, has a probability of occurring that lies between zero and one; the closer the probability of an event lies to _____, the more likely it is to occur. The closer the probability of an event lies to _____, the less likely it is t
1.0
Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is certain. It will occur on every trial.
0.3. If an event won't occur more often than it will occur, it's probability is 0.3.
Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is somewhat likely. But it won't occur more often than it will occur.''
No; When a random phenomenon occurs only once, you cannot predict the outcome. It wouldn't be a random phenomenon if we could predict the outcome. Since playing the game is a random phenomenon, we cannot predict with certainty the outcome.
Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to predict with certainty whether you will win the game on the next try?
yes; If you repeat a random phenomenon many, many times, you can estimate the probability of winning by dividing the number of successes by the total number of trials. Since playing the game is a random phenomenon, we can estimate the probability of winni
Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to estimate the probability of winning the game?
The fraction of times the outcome occurs in many, many trials of a random phenomenon.
In statistics, how do we define probability of an outcome?
all possible values of the random variable together with their probabilities
What does the probability distribution of a random variable give us?
The fraction of "1's'' gets closer to 1/6 as more and more tosses are made. Probability is the fraction of times the outcome occurs in many, many trials of a random phenomenon. On the basis of this definition, the probability is one-sixth because the frac
The statistical definition of probability says that probability is based on the fraction of how many times an outcome occurs divided by the number of trials. On the basis of this, why is the probability of obtaining a "1'' when tossing a die is 1/6?
To answer questions about how much the x bar's vary from one SRS to the next. Since statistical inference is using the value of a statistic to estimate the value of a parameter, we need to simulate the value of a statistic using many, many SRS's. These si
Why do we simulate many, many SRS's and obtain the value of x bar from each sample?
False. In statistics random is NOT synonymous with haphazard. In statistics, a phenomenon is random if the outcome of one play is unpredictable, but the outcomes of many, many plays form a distribution from which we can estimate probabilities. The term ra
True or False: In statistics random is synonymous with haphazard.
True. All probabilities must be greater than or equal to zero and less than or equal to one.
True or False: Probabilities cannot ever be negative or greater than one.
True. This is an important and often used probability rule.
True or false: The probability that an event does not occur equals one minus the probability that the event does occur.
Association
________ exists if the conditionals differ from each other and from the marginal
To measure the strength of the linear relationship between X and Y.
Since it is difficult to describe the strength of the linear relationship between X and Y in words, the correlation coefficient gives us a numerical measure for the strength of the linear
What is the purpose of the correlation coefficient, r?
Rise in y over run in x
Mathematically, slope equals...
True. While the traditional mathematical model is y = mx + b where m = slope and b = y-intercept, statisticians switch slope times x with the y-intercept in the equation so that additional explanatory variables, x's, may be added.
True or false: The statistical model for a straight line is y hat =a+bx
True. The formula for slope is b equals r times s sub-y and s sub-x. Since both s sub-y and s sub-x are positive, slope and r have to have the same sign.
True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).
Residual equals observed y minus predicted y, or y minus y hat.
What is the equation to determine the value of a residual?
The least squares line minimizes the sum of squared residuals.
The vertical distances of the data points from the line are measured by residuals. To find the least-squares line we square these residuals and add them. The least-squares line minimizes this
What is unique about a least squares line?
scatterplot
a graph for displaying bivariate quantitative data
lower case R
the symbol for sample correlation coefficient
correlation coefficient
a measure of the strength of the linear relationship between X and Y
bivariate quantitative data with a linear relationship
Requirements for computing correlation coefficient
residual
the name of the value computed from observed Y minus predicted Y.
r squared
A measure of the percentage of variation in Y explained by X.
least squares regression line
the line with the smallest sum of squared residuals
slope
a measure of the average change in Y for every one unit increase in X
residual
a measure of how far a data point is vertically from the regression line
none, no measurement
the unit of measure for correlation coefficient
they both have the same sign
the commonality between slope and correlation coefficient
conditional distribution
a distribution computed from only one row or one column of a two-way table
marginal distribution
a distribution computed from the row totals or the column totals of a two-way table
they are different
how the conditional distributions compare when an association exists between the explanatory and response variables
a conditional distribution
percentages found by dividing the counts in a row by the row total, or counts in a column by the column total
Central Limit Theorem
the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random
sampling distribution
a list of the possible values of a statistic together with the frequency or probability of each value
approximately Normal
the shape of the sampling distribution of x bar when teh sample is random form a non-normal population and the sample size is large
sigma over the square root of n
the symbol for the standard deviation of the theoretical sampling distribution of x bar
mu
The symbol for the mean of the theoretical sampling distribution of x bar
standard deviation of the sampling distribution of x bar
a measure of the variability of the values of the statistic x bar about mu.
Normal
shape of the sampling distribution of x bar when the sample is small and randomly selected from a Normal population
standard deviation of the sampling distribution of x bar
a measure of the variability of the sampling distribution of x bar
mean of the sampling distribution of x bar, namely, mu
a measure of the center of the sampling distribution of x bar
Law of Large Numbers
the name of the fact that the average of the data in a sample will get closer and closer to the population mean as we increase the sample size
parameter
A characteristic of a population that is usually unknown.
sample
A subset of the population.
statistic symbols
x bar, s, p hat, r.
parameter symbols
mu, sigma, and p.
inference
using results from a sample to draw conclusions about the entire population
statistic
A number computed from sample data used to estimate a parameter.
population
a collection of all the individuals about which we wish information
probability samples
type of samples required for valid inference
similar to population histogram
shape of the histogram of a sample when the sample is large and random
error in the estimate
the difference between the value of a statistic from a sample and the parameter it estimates
distribution
a list of the possible values of a variable together with the frequencies of each value
one
the sum of the probabilities of all possible outcomes
simulation
using random numbers to imitate chance behavior
probability of event A plus probability of event B
the probability of event A or event B where events A and B are disjoint
probability of an outcome
a measure of the proportion of times an outcome occurs in the long run that gives us an indication of the likelihood of the outcome
mu
value of the center line
process control
A procedure used to check a process at regular intervals to detect problems and correct them before they become serious.
control chart or quality control chart
A chart plotting the means, x's, of regular samples of size n against time; this chart is used to access whether the process is in control.
blocking
The grouping of experimental units according to some similar characteristic where the random allocation is carried out separately within each group.
bias
The condition eliminated by randomly allocating individuals to treatments.
statistically significant
Results of a study that differ too much from what we expect due to just randomization to attribute to chance.
replication
The condition of having more than one individual in each treatment combination.
randomized block experiment" over "completely randomized experiment
The advantage of _______________ over _____________ is to remove variation associated with the blocking variable from experimental error.
false. the mean of the sampling distribution of x bar equals mu
True or false. x bar is the value of the mean of the sampling distribution of x bar.
false. The standard deviation of the population is greater than the standard deviation of the sampling distribution of x bar.
True or false. The standard deviation of the population is less than the standard deviation of the sampling distribution of x bar. .
true
True or false. The sampling distribution of x is always taller and skinnier than the population.
false. the mean of the sampling distribution of x bar exactly equals mu.
True or false. The mean of the sampling distribution of x gets closer and closer to x bar as the sample size increases.
false. The shape of the sampling distribution of x bar gets closer and closer to Normal.
True or false. The shape of the sampling distribution of x bar gets closer and closer to the shape of the population as sample size increases
true
True or false. The shape of the histogram of data in a sample gets closer and closer to the shape of the population as sample size increases.
false. The shape of the sampling distribution of x bar is Normal regardless of sample size if the population is Normal.
True or false. The shape of the sampling distribution of x bar gets closer and closer to Normal as the sample size increases when the population is normal.
true
True or false. The shape of the sampling distribution of x bar gets closer and closer to Normal as the sample size increases when the population is non-normal.
true
True or false. The shape of the sampling distribution of x bar is always Normal when the population shape is Normal.
false. The standard deviation of the sampling distribution of x bar equals sigma over the square root of n.
True or false. The standard deviation of the sampling distribution of x bar gets closer and closer to sigma as n increases.
true
True or false. The standard deviation of the sampling distribution of x bar gets smaller and smaller as n increases.
false. The standard deviation of the sampling distribution of x is equal to mu over the square root of n regardless of
sample size.
rue or false. The standard deviation of the sampling distribution of x bar gets closer and closer to mu over the square root of n as n increases.
true
True or false. We measure the variability of the sampling distribution of x bar with mu over the square root of n .
true
true or false. The sampling distribution of x bar tells us the possible values for x bar together with how often each occurs.
false. The sampling distribution of x bar tells us possible values we could get for the sample mean. That's because the sampling distribution gives us all possible values for x bar together with their frequencies, or probabilities.
True or false. The sampling distribution of x bar tells us all possible values we could get in our sample of size n.
sampling distribution
The distribution of values taken by a statistic in all possible samples of the same size from the same population.
population
The entire group of individuals that we want information about.
distribution
Tells us what values a variable takes and how often it takes these values.
mean
The ______ of the statistic is always equal to the mean ? of the population.
parameter
A number that describes the population. In statistical practice, the value is not known because we cannot examine the entire population.
large
The results of ______ samples are less variable than the results of small samples, so we can trust the sample mean from a _____ random sample to estimate the population mean accurately.
Normal
if the population distribution is _______ , then so is the sampling distribution of the sample mean. if the population distribution is ______, then the sampling distribution of the sample mean is _______ regardless of the sample size n.
four
the standard deviation of the sampling distribution gets smaller only at the rate square root of n . To cut the standard deviation of x bar in half, we must take ____ times as many observations, not just twice as many. So very precise estimates (estimates
increases
It is a remarkable fact that as the sample size _______, the distribution of x bar changes shape: it looks less like that of the population and more like a Normal distribution.
central limit theorem
States that for large n, the sampling distribution of the sample mean x bar is approximately Normal for any population with mean � and finite standard deviation ?. That is, averages are more Normal than individual observations.
small
Any variable that is a sum of many _______ influences will have approximately a Normal distribution
variable, normal
Means of random samples are less ________ than individual observations. Means of random samples are more ________ than individual observations.
stability, stable
The statistical description of _______ over time requires that the pattern of variation remain _______, not that there be no variation in the variable measured.
control charts
statistical tools that monitor a process and alert us when the process has been disturbed so that it is now out of control. This is a signal to find and correct the cause of the disturbance.
process monitoring conditions
Measure a quantitative variable x that has a Normal distribution. The process has been operating in control for a long period, so that we know the process mean ? and the process standard deviation ? that describe the distribution of x as long as the proce
correlation coefficient
A measure of the strength of the linear relationship between two variables.
bivariate quantitative data
The type of data required for regression analysis.
explanatory variable
A variable that gives the value (may not be a number) of the outcome of a study
two variables - quantitative and their relationship must be linear
The two requirements for computing correlation coefficient.
scatterplot
A two dimensional plot used to examine strength of relationship between two variables as well as direction
and type of relationship.
outlier
An observation that substantially alters the correlation coefficient value.
positive association
Type of association where high values of one variable tend to associate with high values of another variable.
plus one and minus one
The maximum and minimum possible values of correlation coefficient.
no unit of measure
The unit of measure for the correlation coefficient.
zero; non-linear relationship and no relationship can have an r of zero
The value of the correlation coefficient when there is no linear association between two quantitative variables.
conditional distributions
If you want to examine the relationship (if any) between two categorical variables, look at the
marginal distribution
one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.
conditional distribution
the distribution of values of that variable among only individuals who have a given value of the other variable. There is a separate conditional distribution for each value of the other variable
simpson's paradox
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group
marginal distributions
In a two-way table, the row totals and column totals give the marginal distributions of the two individual variables. It is clearer to present these distributions as percents of the table total. _______ ________ tell us nothing about the relationship betw
conditional distribution
To find the ______ _______ of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.
bar graphs
______ _______ are a flexible means of presenting categorical data. There is no single best way to describe an association between two categorical variables.
lurking variables
Simpson's paradox is an example of the effect of ________ _______ on an observed association
TRUE
True or false: Slope, b, and correlation coefficient, r, have the same sign (positive or negative).
TRUE
True or false: The least squares regression line always intersects the point: x bar, y bar.
correlation
Which of the following allows you to interchange the roles of x and y? correlation or regression?
y hat equals A plus BX
the statistical model for a straight line
TRUE
True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).
explained variation; the variability of the predicted y's
what does the sum of squared residuals measure?
TRUE
True or false: "Total variation in y" measures the variability of the y's about their mean y bar.
r squared
the percentage of total variation in y that can be explained by the regression by x". When a question asks about "the percentage of total variation in y that can be explained by x", it's asking for ____.
True, because if X tells us nothing about the value of Y, then none of the variation in Y is explained by X so r squared = zero %.
True or false: If the value of X tells us nothing about the value of Y, then r2 = 0%.
influential observation
a data value whose removal from the data set results in drastically changed estimates for slope and/or y-intercept.
True. An outlier in the x direction affects the mean and standard deviation of the x's. An outlier in the y direction affects the mean and standard deviation of the y's. These means and standard deviations are used in the computation of the correlation an
True or false: An outlier in either the x or y direction affects the value of the correlation, r.
False. High correlation does not imply causation unless the data are from an experiment.
True or false: If the correlation between x and y is r = 0.99, we can say that changes in x cause changes in y.
correlation; the purpose of correlation is to measure the strength of linear relationships
If you want to measure the strength of the linear relationship between two quantitative variables, which of the following should you use?
r squared
we interpret ______ as the percentage of total variation in our y variable that can be explained by our x variable
True. A lurking variable is a variable not included in a study but that might explain the relationship between the two variables that you did. For example, length of feet and scores on a national reading exam are highly correlated for children ages 6 to 1
True or false: Two quantitative variables, x and y, may be strongly correlated because both are consequences of a third lurking variable
In a two-way table. Since counting is the only appropriate operation for categorical data, these counts need to be displayed. We display them with two way tables
How do we display a set of bivariate categorical data?
Counting. Since categorical data are words not numbers that have value, mathematical operations such as adding, subtracting, and multiplying do not make sense. Only counting is appropriate for categorical data. That is why it is sometimes called "count" d
what operation can be performed on categorical data?
The totals for each row and each column. The margins show the totals in each row and each column. These totals are sometimes called marginals because they are in the margins of the table.
In a two-way table, what is given in the "margins"?
An association exists if the conditional distributions DO NOT equal the marginal distribution.
Association
True. Often the row variable is the explanatory variable and the response variable is the column variable.
True or false: rows and columns are used to display the explanatory and response variables for bivariate categorical data.
random variable
a variable whose value is a numerical outcome of a random phenomenon.
what values x can take and and how to assign probabilities to those values.
probability distribution of a random variable x tells us....
probability model
shows all the possible outcomes of a random phenomenon and the probability of each outcome occurring
random phenomenon
an event whose outcome we cannot know in advance with complete certainty but that has a regular pattern of occurrences over many repetitions
probability
the proportion of times it would occur in a long series of repetitions or trials
one, zero
any event, like a coin toss, has a probability of occurring that lies between zero and one; the closer the probability of an event lies to _____, the more likely it is to occur. The closer the probability of an event lies to _____, the less likely it is t
1
Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is certain. It will occur on every trial.
0.3. If an event won't occur more often than it will occur, it's probability is 0.3.
Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is somewhat likely. But it won't occur more often than it will occur.''
No; When a random phenomenon occurs only once, you cannot predict the outcome. It wouldn't be a random phenomenon if we could predict the outcome. Since playing the game is a random phenomenon, we cannot predict with certainty the outcome.
Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to predict with certainty whether you will win the game on the next try?
yes; If you repeat a random phenomenon many, many times, you can estimate the probability of winning by dividing the number of successes by the total number of trials. Since playing the game is a random phenomenon, we can estimate the probability of winni
Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to estimate the probability of winning the game?
The fraction of times the outcome occurs in many, many trials of a random phenomenon.
In statistics, how do we define probability of an outcome?
all possible values of the random variable together with their probabilities
What does the probability distribution of a random variable give us?
The fraction of "1's'' gets closer to 1/6 as more and more tosses are made. Probability is the fraction of times the outcome occurs in many, many trials of a random phenomenon. On the basis of this definition, the probability is one-sixth because the frac
The statistical definition of probability says that probability is based on the fraction of how many times an outcome occurs divided by the number of trials. On the basis of this, why is the probability of obtaining a "1'' when tossing a die is 1/6?
To answer questions about how much the x bar's vary from one SRS to the next. Since statistical inference is using the value of a statistic to estimate the value of a parameter, we need to simulate the value of a statistic using many, many SRS's. These si
Why do we simulate many, many SRS's and obtain the value of x bar from each sample?
False. In statistics random is NOT synonymous with haphazard. In statistics, a phenomenon is random if the outcome of one play is unpredictable, but the outcomes of many, many plays form a distribution from which we can estimate probabilities. The term ra
True or False: In statistics random is synonymous with haphazard.
True. All probabilities must be greater than or equal to zero and less than or equal to one.
True or False: Probabilities cannot ever be negative or greater than one.
True. This is an important and often used probability rule.
True or false: The probability that an event does not occur equals one minus the probability that the event does occur.
Association
________ exists if the conditionals differ from each other and from the marginal
To measure the strength of the linear relationship between X and Y.
Since it is difficult to describe the strength of the linear relationship between X and Y in words, the correlation coefficient gives us a numerical measure for the strength of the linear
What is the purpose of the correlation coefficient, r?
Rise in y over run in x
Mathematically, slope equals...
True. While the traditional mathematical model is y = mx + b where m = slope and b = y-intercept, statisticians switch slope times x with the y-intercept in the equation so that additional explanatory variables, x's, may be added.
True or false: The statistical model for a straight line is y hat =a+bx
True. The formula for slope is b equals r times s sub-y and s sub-x. Since both s sub-y and s sub-x are positive, slope and r have to have the same sign.
True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).
Residual equals observed y minus predicted y, or y minus y hat.
What is the equation to determine the value of a residual?
The least squares line minimizes the sum of squared residuals.
The vertical distances of the data points from the line are measured by residuals. To find the least-squares line we square these residuals and add them. The least-squares line minimizes this
What is unique about a least squares line?
scatterplot
a graph for displaying bivariate quantitative data
lower case R
the symbol for sample correlation coefficient
correlation coefficient
a measure of the strength of the linear relationship between X and Y
bivariate quantitative data with a linear relationship
Requirements for computing correlation coefficient
residual
the name of the value computed from observed Y minus predicted Y.
r squared
A measure of the percentage of variation in Y explained by X.
least squares regression line
the line with the smallest sum of squared residuals
slope
a measure of the average change in Y for every one unit increase in X
residual
a measure of how far a data point is vertically from the regression line
none, no measurement
the unit of measure for correlation coefficient
they both have the same sign
the commonality between slope and correlation coefficient
conditional distribution
a distribution computed from only one row or one column of a two-way table
marginal distribution
a distribution computed from the row totals or the column totals of a two-way table
they are different
how the conditional distributions compare when an association exists between the explanatory and response variables
a conditional distribution
percentages found by dividing the counts in a row by the row total, or counts in a column by the column total
Central Limit Theorem
the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random
sampling distribution
a list of the possible values of a statistic together with the frequency or probability of each value
approximately Normal
the shape of the sampling distribution of x bar when teh sample is random form a non-normal population and the sample size is large
sigma over the square root of n
the symbol for the standard deviation of the theoretical sampling distribution of x bar
mu
The symbol for the mean of the theoretical sampling distribution of x bar
standard deviation of the sampling distribution of x bar
a measure of the variability of the values of the statistic x bar about mu.
Normal
shape of the sampling distribution of x bar when the sample is small and randomly selected from a Normal population
standard deviation of the sampling distribution of x bar
a measure of the variability of the sampling distribution of x bar
mean of the sampling distribution of x bar, namely, mu
a measure of the center of the sampling distribution of x bar
Law of Large Numbers
the name of the fact that the average of the data in a sample will get closer and closer to the population mean as we increase the sample size
parameter
A characteristic of a population that is usually unknown.
sample
A subset of the population.
statistic symbols
x bar, s, p hat, r.
parameter symbols
mu, sigma, and p.
inference
using results from a sample to draw conclusions about the entire population
statistic
A number computed from sample data used to estimate a parameter.
population
a collection of all the individuals about which we wish information
probability samples
type of samples required for valid inference
similar to population histogram
shape of the histogram of a sample when the sample is large and random
error in the estimate
the difference between the value of a statistic from a sample and the parameter it estimates
distribution
a list of the possible values of a variable together with the frequencies of each value
one
the sum of the probabilities of all possible outcomes
simulation
using random numbers to imitate chance behavior
probability of event A plus probability of event B
the probability of event A or event B where events A and B are disjoint
probability of an outcome
a measure of the proportion of times an outcome occurs in the long run that gives us an indication of the likelihood of the outcome
mu
value of the center line
process control
A procedure used to check a process at regular intervals to detect problems and correct them before they become serious.
control chart or quality control chart
A chart plotting the means, x's, of regular samples of size n against time; this chart is used to access whether the process is in control.
blocking
The grouping of experimental units according to some similar characteristic where the random allocation is carried out separately within each group.
bias
The condition eliminated by randomly allocating individuals to treatments.
statistically significant
Results of a study that differ too much from what we expect due to just randomization to attribute to chance.
replication
The condition of having more than one individual in each treatment combination.
randomized block experiment" over "completely randomized experiment
The advantage of _______________ over _____________ is to remove variation associated with the blocking variable from experimental error.
false. the mean of the sampling distribution of x bar equals mu
True or false. x bar is the value of the mean of the sampling distribution of x bar.
false. The standard deviation of the population is greater than the standard deviation of the sampling distribution of x bar.
True or false. The standard deviation of the population is less than the standard deviation of the sampling distribution of x bar. .
TRUE
True or false. The sampling distribution of x is always taller and skinnier than the population.
false. the mean of the sampling distribution of x bar exactly equals mu.
True or false. The mean of the sampling distribution of x gets closer and closer to x bar as the sample size increases.
false. The shape of the sampling distribution of x bar gets closer and closer to Normal.
True or false. The shape of the sampling distribution of x bar gets closer and closer to the shape of the population as sample size increases
TRUE
True or false. The shape of the histogram of data in a sample gets closer and closer to the shape of the population as sample size increases.
false. The shape of the sampling distribution of x bar is Normal regardless of sample size if the population is Normal.
True or false. The shape of the sampling distribution of x bar gets closer and closer to Normal as the sample size increases when the population is normal.
TRUE
True or false. The shape of the sampling distribution of x bar gets closer and closer to Normal as the sample size increases when the population is non-normal.
TRUE
True or false. The shape of the sampling distribution of x bar is always Normal when the population shape is Normal.
false. The standard deviation of the sampling distribution of x bar equals sigma over the square root of n.
True or false. The standard deviation of the sampling distribution of x bar gets closer and closer to sigma as n increases.
TRUE
True or false. The standard deviation of the sampling distribution of x bar gets smaller and smaller as n increases.
false. The standard deviation of the sampling distribution of x is equal to mu over the square root of n regardless of
sample size.
rue or false. The standard deviation of the sampling distribution of x bar gets closer and closer to mu over the square root of n as n increases.
TRUE
True or false. We measure the variability of the sampling distribution of x bar with mu over the square root of n .
TRUE
true or false. The sampling distribution of x bar tells us the possible values for x bar together with how often each occurs.
false. The sampling distribution of x bar tells us possible values we could get for the sample mean. That's because the sampling distribution gives us all possible values for x bar together with their frequencies, or probabilities.
True or false. The sampling distribution of x bar tells us all possible values we could get in our sample of size n.
sampling distribution
The distribution of values taken by a statistic in all possible samples of the same size from the same population.
population
The entire group of individuals that we want information about.
distribution
Tells us what values a variable takes and how often it takes these values.
mean
The ______ of the statistic is always equal to the mean ? of the population.
parameter
A number that describes the population. In statistical practice, the value is not known because we cannot examine the entire population.
large
The results of ______ samples are less variable than the results of small samples, so we can trust the sample mean from a _____ random sample to estimate the population mean accurately.
Normal
if the population distribution is _______ , then so is the sampling distribution of the sample mean. if the population distribution is ______, then the sampling distribution of the sample mean is _______ regardless of the sample size n.
four
the standard deviation of the sampling distribution gets smaller only at the rate square root of n . To cut the standard deviation of x bar in half, we must take ____ times as many observations, not just twice as many. So very precise estimates (estimates
increases
It is a remarkable fact that as the sample size _______, the distribution of x bar changes shape: it looks less like that of the population and more like a Normal distribution.
central limit theorem
States that for large n, the sampling distribution of the sample mean x bar is approximately Normal for any population with mean � and finite standard deviation ?. That is, averages are more Normal than individual observations.
small
Any variable that is a sum of many _______ influences will have approximately a Normal distribution
variable, normal
Means of random samples are less ________ than individual observations. Means of random samples are more ________ than individual observations.
stability, stable
The statistical description of _______ over time requires that the pattern of variation remain _______, not that there be no variation in the variable measured.
control charts
statistical tools that monitor a process and alert us when the process has been disturbed so that it is now out of control. This is a signal to find and correct the cause of the disturbance.
process monitoring conditions
Measure a quantitative variable x that has a Normal distribution. The process has been operating in control for a long period, so that we know the process mean ? and the process standard deviation ? that describe the distribution of x as long as the proce