Stats Exam 2 - jeopardy | AP Statistics

correlation coefficient

A measure of the strength of the linear relationship between two variables.

bivariate quantitative data

The type of data required for regression analysis.

explanatory variable

A variable that gives the value (may not be a number) of the outcome of a study

two variables - quantitative and their relationship must be linear

The two requirements for computing correlation coefficient.

scatterplot

A two dimensional plot used to examine strength of relationship between two variables as well as direction
and type of relationship.

outlier

An observation that substantially alters the correlation coefficient value.

positive association

Type of association where high values of one variable tend to associate with high values of another variable.

plus one and minus one

The maximum and minimum possible values of correlation coefficient.

no unit of measure

The unit of measure for the correlation coefficient.

zero; non-linear relationship and no relationship can have an r of zero

The value of the correlation coefficient when there is no linear association between two quantitative variables.

conditional distributions

If you want to examine the relationship (if any) between two categorical variables, look at the

marginal distribution

one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.

conditional distribution

the distribution of values of that variable among only individuals who have a given value of the other variable. There is a separate conditional distribution for each value of the other variable

simpson's paradox

An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group

marginal distributions

In a two-way table, the row totals and column totals give the marginal distributions of the two individual variables. It is clearer to present these distributions as percents of the table total. _______ ________ tell us nothing about the relationship betw

conditional distribution

To find the ______ _______ of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.

bar graphs

______ _______ are a flexible means of presenting categorical data. There is no single best way to describe an association between two categorical variables.

lurking variables

Simpson's paradox is an example of the effect of ________ _______ on an observed association

true

True or false: Slope, b, and correlation coefficient, r, have the same sign (positive or negative).

true

True or false: The least squares regression line always intersects the point: x bar, y bar.

correlation

Which of the following allows you to interchange the roles of x and y? correlation or regression?

y hat equals A plus BX

the statistical model for a straight line

true

True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).

explained variation; the variability of the predicted y's

what does the sum of squared residuals measure?

true

True or false: "Total variation in y" measures the variability of the y's about their mean y bar.

r squared

the percentage of total variation in y that can be explained by the regression by x". When a question asks about "the percentage of total variation in y that can be explained by x", it's asking for ____.

True, because if X tells us nothing about the value of Y, then none of the variation in Y is explained by X so r squared = zero %.

True or false: If the value of X tells us nothing about the value of Y, then r2 = 0%.

influential observation

a data value whose removal from the data set results in drastically changed estimates for slope and/or y-intercept.

True. An outlier in the x direction affects the mean and standard deviation of the x's. An outlier in the y direction affects the mean and standard deviation of the y's. These means and standard deviations are used in the computation of the correlation an

True or false: An outlier in either the x or y direction affects the value of the correlation, r.

False. High correlation does not imply causation unless the data are from an experiment.

True or false: If the correlation between x and y is r = 0.99, we can say that changes in x cause changes in y.

correlation; the purpose of correlation is to measure the strength of linear relationships

If you want to measure the strength of the linear relationship between two quantitative variables, which of the following should you use?

r squared

we interpret ______ as the percentage of total variation in our y variable that can be explained by our x variable

True. A lurking variable is a variable not included in a study but that might explain the relationship between the two variables that you did. For example, length of feet and scores on a national reading exam are highly correlated for children ages 6 to 1

True or false: Two quantitative variables, x and y, may be strongly correlated because both are consequences of a third lurking variable

In a two-way table. Since counting is the only appropriate operation for categorical data, these counts need to be displayed. We display them with two way tables

How do we display a set of bivariate categorical data?

Counting. Since categorical data are words not numbers that have value, mathematical operations such as adding, subtracting, and multiplying do not make sense. Only counting is appropriate for categorical data. That is why it is sometimes called "count" d

what operation can be performed on categorical data?

The totals for each row and each column. The margins show the totals in each row and each column. These totals are sometimes called marginals because they are in the margins of the table.

In a two-way table, what is given in the "margins"?

An association exists if the conditional distributions DO NOT equal the marginal distribution.

Association

True. Often the row variable is the explanatory variable and the response variable is the column variable.

True or false: rows and columns are used to display the explanatory and response variables for bivariate categorical data.

random variable

a variable whose value is a numerical outcome of a random phenomenon.

what values x can take and and how to assign probabilities to those values.

probability distribution of a random variable x tells us....

probability model

shows all the possible outcomes of a random phenomenon and the probability of each outcome occurring

random phenomenon

an event whose outcome we cannot know in advance with complete certainty but that has a regular pattern of occurrences over many repetitions

probability

the proportion of times it would occur in a long series of repetitions or trials

one, zero

any event, like a coin toss, has a probability of occurring that lies between zero and one; the closer the probability of an event lies to _____, the more likely it is to occur. The closer the probability of an event lies to _____, the less likely it is t

1.0

Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is certain. It will occur on every trial.

0.3. If an event won't occur more often than it will occur, it's probability is 0.3.

Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is somewhat likely. But it won't occur more often than it will occur.''

No; When a random phenomenon occurs only once, you cannot predict the outcome. It wouldn't be a random phenomenon if we could predict the outcome. Since playing the game is a random phenomenon, we cannot predict with certainty the outcome.

Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to predict with certainty whether you will win the game on the next try?

yes; If you repeat a random phenomenon many, many times, you can estimate the probability of winning by dividing the number of successes by the total number of trials. Since playing the game is a random phenomenon, we can estimate the probability of winni

Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to estimate the probability of winning the game?

The fraction of times the outcome occurs in many, many trials of a random phenomenon.

In statistics, how do we define probability of an outcome?

all possible values of the random variable together with their probabilities

What does the probability distribution of a random variable give us?

The fraction of "1's'' gets closer to 1/6 as more and more tosses are made. Probability is the fraction of times the outcome occurs in many, many trials of a random phenomenon. On the basis of this definition, the probability is one-sixth because the frac

The statistical definition of probability says that probability is based on the fraction of how many times an outcome occurs divided by the number of trials. On the basis of this, why is the probability of obtaining a "1'' when tossing a die is 1/6?

To answer questions about how much the x bar's vary from one SRS to the next. Since statistical inference is using the value of a statistic to estimate the value of a parameter, we need to simulate the value of a statistic using many, many SRS's. These si

Why do we simulate many, many SRS's and obtain the value of x bar from each sample?

False. In statistics random is NOT synonymous with haphazard. In statistics, a phenomenon is random if the outcome of one play is unpredictable, but the outcomes of many, many plays form a distribution from which we can estimate probabilities. The term ra

True or False: In statistics random is synonymous with haphazard.

True. All probabilities must be greater than or equal to zero and less than or equal to one.

True or False: Probabilities cannot ever be negative or greater than one.

True. This is an important and often used probability rule.

True or false: The probability that an event does not occur equals one minus the probability that the event does occur.

Association

________ exists if the conditionals differ from each other and from the marginal

To measure the strength of the linear relationship between X and Y.
Since it is difficult to describe the strength of the linear relationship between X and Y in words, the correlation coefficient gives us a numerical measure for the strength of the linear

What is the purpose of the correlation coefficient, r?

Rise in y over run in x

Mathematically, slope equals...

True. While the traditional mathematical model is y = mx + b where m = slope and b = y-intercept, statisticians switch slope times x with the y-intercept in the equation so that additional explanatory variables, x's, may be added.

True or false: The statistical model for a straight line is y hat =a+bx

True. The formula for slope is b equals r times s sub-y and s sub-x. Since both s sub-y and s sub-x are positive, slope and r have to have the same sign.

True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).

Residual equals observed y minus predicted y, or y minus y hat.

What is the equation to determine the value of a residual?

The least squares line minimizes the sum of squared residuals.
The vertical distances of the data points from the line are measured by residuals. To find the least-squares line we square these residuals and add them. The least-squares line minimizes this

What is unique about a least squares line?

scatterplot

a graph for displaying bivariate quantitative data

lower case R

the symbol for sample correlation coefficient

correlation coefficient

a measure of the strength of the linear relationship between X and Y

bivariate quantitative data with a linear relationship

Requirements for computing correlation coefficient

residual

the name of the value computed from observed Y minus predicted Y.

r squared

A measure of the percentage of variation in Y explained by X.

least squares regression line

the line with the smallest sum of squared residuals

slope

a measure of the average change in Y for every one unit increase in X

residual

a measure of how far a data point is vertically from the regression line

none, no measurement

the unit of measure for correlation coefficient

they both have the same sign

the commonality between slope and correlation coefficient

conditional distribution

a distribution computed from only one row or one column of a two-way table

marginal distribution

a distribution computed from the row totals or the column totals of a two-way table

they are different

how the conditional distributions compare when an association exists between the explanatory and response variables

a conditional distribution

percentages found by dividing the counts in a row by the row total, or counts in a column by the column total

Central Limit Theorem

the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random

sampling distribution

a list of the possible values of a statistic together with the frequency or probability of each value

approximately Normal

the shape of the sampling distribution of x bar when teh sample is random form a non-normal population and the sample size is large

sigma over the square root of n

the symbol for the standard deviation of the theoretical sampling distribution of x bar

mu

The symbol for the mean of the theoretical sampling distribution of x bar

standard deviation of the sampling distribution of x bar

a measure of the variability of the values of the statistic x bar about mu.

Normal

shape of the sampling distribution of x bar when the sample is small and randomly selected from a Normal population

standard deviation of the sampling distribution of x bar

a measure of the variability of the sampling distribution of x bar

mean of the sampling distribution of x bar, namely, mu

a measure of the center of the sampling distribution of x bar

Law of Large Numbers

the name of the fact that the average of the data in a sample will get closer and closer to the population mean as we increase the sample size

parameter

A characteristic of a population that is usually unknown.

sample

A subset of the population.

statistic symbols

x bar, s, p hat, r.

parameter symbols

mu, sigma, and p.

inference

using results from a sample to draw conclusions about the entire population

statistic

A number computed from sample data used to estimate a parameter.

population

a collection of all the individuals about which we wish information

probability samples

type of samples required for valid inference

similar to population histogram

shape of the histogram of a sample when the sample is large and random

error in the estimate

the difference between the value of a statistic from a sample and the parameter it estimates

distribution

a list of the possible values of a variable together with the frequencies of each value

one

the sum of the probabilities of all possible outcomes

simulation

using random numbers to imitate chance behavior

probability of event A plus probability of event B

the probability of event A or event B where events A and B are disjoint

probability of an outcome

a measure of the proportion of times an outcome occurs in the long run that gives us an indication of the likelihood of the outcome

mu

value of the center line

process control

A procedure used to check a process at regular intervals to detect problems and correct them before they become serious.

control chart or quality control chart

A chart plotting the means, x's, of regular samples of size n against time; this chart is used to access whether the process is in control.

blocking

The grouping of experimental units according to some similar characteristic where the random allocation is carried out separately within each group.

bias

The condition eliminated by randomly allocating individuals to treatments.

statistically significant

Results of a study that differ too much from what we expect due to just randomization to attribute to chance.

replication

The condition of having more than one individual in each treatment combination.

randomized block experiment" over "completely randomized experiment

The advantage of _______________ over _____________ is to remove variation associated with the blocking variable from experimental error.

false. the mean of the sampling distribution of x bar equals mu

True or false. x bar is the value of the mean of the sampling distribution of x bar.

false. The standard deviation of the population is greater than the standard deviation of the sampling distribution of x bar.

True or false. The standard deviation of the population is less than the standard deviation of the sampling distribution of x bar. .

true

True or false. The sampling distribution of x is always taller and skinnier than the population.

false. the mean of the sampling distribution of x bar exactly equals mu.

True or false. The mean of the sampling distribution of x gets closer and closer to x bar as the sample size increases.

false. The shape of the sampling distribution of x bar gets closer and closer to Normal.

True or false. The shape of the sampling distribution of x bar gets closer and closer to the shape of the population as sample size increases

true

True or false. The shape of the histogram of data in a sample gets closer and closer to the shape of the population as sample size increases.

false. The shape of the sampling distribution of x bar is Normal regardless of sample size if the population is Normal.

True or false. The shape of the sampling distribution of x bar gets closer and closer to Normal as the sample size increases when the population is normal.

true

True or false. The shape of the sampling distribution of x bar gets closer and closer to Normal as the sample size increases when the population is non-normal.

true

True or false. The shape of the sampling distribution of x bar is always Normal when the population shape is Normal.

false. The standard deviation of the sampling distribution of x bar equals sigma over the square root of n.

True or false. The standard deviation of the sampling distribution of x bar gets closer and closer to sigma as n increases.

true

True or false. The standard deviation of the sampling distribution of x bar gets smaller and smaller as n increases.

false. The standard deviation of the sampling distribution of x is equal to mu over the square root of n regardless of
sample size.

rue or false. The standard deviation of the sampling distribution of x bar gets closer and closer to mu over the square root of n as n increases.

true

True or false. We measure the variability of the sampling distribution of x bar with mu over the square root of n .

true

true or false. The sampling distribution of x bar tells us the possible values for x bar together with how often each occurs.

false. The sampling distribution of x bar tells us possible values we could get for the sample mean. That's because the sampling distribution gives us all possible values for x bar together with their frequencies, or probabilities.

True or false. The sampling distribution of x bar tells us all possible values we could get in our sample of size n.

sampling distribution

The distribution of values taken by a statistic in all possible samples of the same size from the same population.

population

The entire group of individuals that we want information about.

distribution

Tells us what values a variable takes and how often it takes these values.

mean

The ______ of the statistic is always equal to the mean ? of the population.

parameter

A number that describes the population. In statistical practice, the value is not known because we cannot examine the entire population.

large

The results of ______ samples are less variable than the results of small samples, so we can trust the sample mean from a _____ random sample to estimate the population mean accurately.

Normal

if the population distribution is _______ , then so is the sampling distribution of the sample mean. if the population distribution is ______, then the sampling distribution of the sample mean is _______ regardless of the sample size n.

four

the standard deviation of the sampling distribution gets smaller only at the rate square root of n . To cut the standard deviation of x bar in half, we must take ____ times as many observations, not just twice as many. So very precise estimates (estimates

increases

It is a remarkable fact that as the sample size _______, the distribution of x bar changes shape: it looks less like that of the population and more like a Normal distribution.

central limit theorem

States that for large n, the sampling distribution of the sample mean x bar is approximately Normal for any population with mean � and finite standard deviation ?. That is, averages are more Normal than individual observations.

small

Any variable that is a sum of many _______ influences will have approximately a Normal distribution

variable, normal

Means of random samples are less ________ than individual observations. Means of random samples are more ________ than individual observations.

stability, stable

The statistical description of _______ over time requires that the pattern of variation remain _______, not that there be no variation in the variable measured.

control charts

statistical tools that monitor a process and alert us when the process has been disturbed so that it is now out of control. This is a signal to find and correct the cause of the disturbance.

process monitoring conditions

Measure a quantitative variable x that has a Normal distribution. The process has been operating in control for a long period, so that we know the process mean ? and the process standard deviation ? that describe the distribution of x as long as the proce

correlation coefficient

A measure of the strength of the linear relationship between two variables.

bivariate quantitative data

The type of data required for regression analysis.

explanatory variable

A variable that gives the value (may not be a number) of the outcome of a study

two variables - quantitative and their relationship must be linear

The two requirements for computing correlation coefficient.

scatterplot

A two dimensional plot used to examine strength of relationship between two variables as well as direction
and type of relationship.

outlier

An observation that substantially alters the correlation coefficient value.

positive association

Type of association where high values of one variable tend to associate with high values of another variable.

plus one and minus one

The maximum and minimum possible values of correlation coefficient.

no unit of measure

The unit of measure for the correlation coefficient.

zero; non-linear relationship and no relationship can have an r of zero

The value of the correlation coefficient when there is no linear association between two quantitative variables.

conditional distributions

If you want to examine the relationship (if any) between two categorical variables, look at the

marginal distribution

one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.

conditional distribution

the distribution of values of that variable among only individuals who have a given value of the other variable. There is a separate conditional distribution for each value of the other variable

simpson's paradox

An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group

marginal distributions

In a two-way table, the row totals and column totals give the marginal distributions of the two individual variables. It is clearer to present these distributions as percents of the table total. _______ ________ tell us nothing about the relationship betw

conditional distribution

To find the ______ _______ of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.

bar graphs

______ _______ are a flexible means of presenting categorical data. There is no single best way to describe an association between two categorical variables.

lurking variables

Simpson's paradox is an example of the effect of ________ _______ on an observed association

TRUE

True or false: Slope, b, and correlation coefficient, r, have the same sign (positive or negative).

TRUE

True or false: The least squares regression line always intersects the point: x bar, y bar.

correlation

Which of the following allows you to interchange the roles of x and y? correlation or regression?

y hat equals A plus BX

the statistical model for a straight line

TRUE

True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).

explained variation; the variability of the predicted y's

what does the sum of squared residuals measure?

TRUE

True or false: "Total variation in y" measures the variability of the y's about their mean y bar.

r squared

the percentage of total variation in y that can be explained by the regression by x". When a question asks about "the percentage of total variation in y that can be explained by x", it's asking for ____.

True, because if X tells us nothing about the value of Y, then none of the variation in Y is explained by X so r squared = zero %.

True or false: If the value of X tells us nothing about the value of Y, then r2 = 0%.

influential observation

a data value whose removal from the data set results in drastically changed estimates for slope and/or y-intercept.

True. An outlier in the x direction affects the mean and standard deviation of the x's. An outlier in the y direction affects the mean and standard deviation of the y's. These means and standard deviations are used in the computation of the correlation an

True or false: An outlier in either the x or y direction affects the value of the correlation, r.

False. High correlation does not imply causation unless the data are from an experiment.

True or false: If the correlation between x and y is r = 0.99, we can say that changes in x cause changes in y.

correlation; the purpose of correlation is to measure the strength of linear relationships

If you want to measure the strength of the linear relationship between two quantitative variables, which of the following should you use?

r squared

we interpret ______ as the percentage of total variation in our y variable that can be explained by our x variable

True. A lurking variable is a variable not included in a study but that might explain the relationship between the two variables that you did. For example, length of feet and scores on a national reading exam are highly correlated for children ages 6 to 1

True or false: Two quantitative variables, x and y, may be strongly correlated because both are consequences of a third lurking variable

In a two-way table. Since counting is the only appropriate operation for categorical data, these counts need to be displayed. We display them with two way tables

How do we display a set of bivariate categorical data?

Counting. Since categorical data are words not numbers that have value, mathematical operations such as adding, subtracting, and multiplying do not make sense. Only counting is appropriate for categorical data. That is why it is sometimes called "count" d

what operation can be performed on categorical data?

The totals for each row and each column. The margins show the totals in each row and each column. These totals are sometimes called marginals because they are in the margins of the table.

In a two-way table, what is given in the "margins"?

An association exists if the conditional distributions DO NOT equal the marginal distribution.

Association

True. Often the row variable is the explanatory variable and the response variable is the column variable.

True or false: rows and columns are used to display the explanatory and response variables for bivariate categorical data.

random variable

a variable whose value is a numerical outcome of a random phenomenon.

what values x can take and and how to assign probabilities to those values.

probability distribution of a random variable x tells us....

probability model

shows all the possible outcomes of a random phenomenon and the probability of each outcome occurring

random phenomenon

an event whose outcome we cannot know in advance with complete certainty but that has a regular pattern of occurrences over many repetitions

probability

the proportion of times it would occur in a long series of repetitions or trials

one, zero

any event, like a coin toss, has a probability of occurring that lies between zero and one; the closer the probability of an event lies to _____, the more likely it is to occur. The closer the probability of an event lies to _____, the less likely it is t

1