# AP STATS TERMS and INTERPRETATIONS

Interpret Standard Deviation

Standard deviation measures spread by giving the typical or average distance that the observations (context) are away from their mean (context)

Interpreting a confidence interval

Intervals produced with this method will capture the true population _________ in about 95% of a possible samples of this same sample size from this same population

CLT

If the population distribution is normal the sampling distribution will also be normal with the same mean as the population. As N increases, the sampling distribution STD DEV will DECREASE
If the population is not normal, the sampling distribution will be

Interpreting Probablity

The probability of any outcome of a random phenom is the proportion of times the outcome will occur in a very long series of repetitions. Probability is a long term relative frequency.

Experimental designs

CRD- Completely randomized design- All experimental units are allocated at random among all treatments
RBD- Randomized Block Design- Experimental Units are put into homogeneous blocks. The random assignment of the units to the treatments is carried out se

Interpret LSRL y intercept "a

WHen the x variable (context) is zero, the y variable (context) is estimated to be ________

Type 1 Error

Rejecting the null when the null is actually true

Type 2 error

Failing to reject the null when it should be rejected

Power

Probability of rejection the null when the null is true

Explain a P value

Assuming that the null is true, The P value measure the chance of observing a statistic as large or larger than one actually observed

Experimental vs Observation

A study is an experiment only if they impose a treatment on the test subjects

Linear Transformation

Multiplying adjusts the mean and spread, but does not change the shape

Two Sample T test, phrasing hints, null and alternative and conclusion

KEY PHRASE:DIFFERENCE IN THE MEANS
Null:M1=M2
ALternative: M1-M2<0, >0 or not equal to 0
M1-M2= the difference between the mean____ for all __ and the mean____ for all ___ is ___
We do/do not have enough evidence at the .05 confidence level to conclude th

Paired T test

Key phrase: MEAN DIFFERENCE
Same as 2 sample t test
Mdiff=The mean difference in ___ for all ___
We do/dont have enough evidence at the .05 confidence level to conclude that the mean difference in ____ for all ___ is _____

Outlier Rule

Upper: Q3+1.5(IQR)
Lower Bound: Q1-1.5(IQR)
IQR=Q3-Q1

Interpret r

Correlation measures the strength and direction of the linear relationship between x and y
R is between -1 and 1
close to zero=very weak
Positive r is positive correlation
Negative r is negative correlation

Advantage of using Stratified Random Sample over an SRS

Stratified random sampling guarantees that each of the strata will be represented. When strata are chosen properly, a stratified random sample will produce better (less variable and more precise) info than the SRS of a sample size

Bias

The systematic favoring of certain outcomes due to flawed sample selection, poor question wording, undercoverage or non response.

Interpret s

s is the standard deviation of the residuals
It measures the typical distance between the actual y values and their predicted y values

Describe or Compare the distributions

S:Shape
O:Outliers
C:Center
If it says compare, use comparison words like greater or less than for center and spread

DOes ___ cause ____?

Association is NOT causation
An observed association, no matter how strong, is not evidence of causation. Only a well designed, controlled experiment can lead to conclusions of cause and effect

SOCS

Shape: Skewness
Outliers: are there ones?
Center: Mean and Median
SPread: Range, IQR, or standard deviation

Can we generalize the results to the population of interest

Yes, if a large random sample was taken from the same pop we want draw conclusions about

Binomial Distribution Conditions

B-Binary, success or failure
I- Trails must be independent
N- Number of trials must be fixed in advance
S- Probability of successes must be the same for each trial

SRS

An SRS is a sample taken in such a way that every set of n individuals has an equal chance to be in the sample actually selected

Interpret Y predicted

Y predicted is the estimated or predicted y value for a given x value

Inference for Means COnditions

Random
Normal: Pop is normal or greater than 30 CLT
Independent: Independent observations and independent samples/groups: 10% if sampling without replacement

Normal CDF

normalcdf(min,max,mean,std dev)
Invnorm(areas to the left as a decimal, mean, std dev)

Extrapolation

Using a LSRL to predict outside the domain of the explanatory variable

Inference for proportions conditions

Random
Normal: Atleast 10 successes and failures in both groups, for a 2 sample problem
Independent: Independent observations and independent samples/groups, or 10%

Interpreting a Confidence Interval

I am __% confident that the interval from _ to __ captures the true _____

Unbiased Estimator

The data is collected in such a way that there is no systematic tendency to overestimate or underestimate the true value of the pop parameter
The mean of the sampling distribution equals the true value of the parameter being estimated

Interpreting Expected Value/Mean

The mean/expected value of a random variable is the long run average outcome of a random phenomenon carried out many times

Goal of blocking

to create groups of homogeneous experimental units
Benefit: reduction of the effect of variation within the experimental units (context)

Interpret r^2

_% of the variation in y (context) is accounted for by the LSRL of y (context) on x (context)

Large samples

WHen collected appropriately, large samples yield more precise results than small samples because in a large sample the values of the sample statistic tend to be closer to the true pop parameter

4 step SIgnificant tests

State: Hypothesis, SIgnificance level, parameters defined
Plan: Check method and conditions
DO: COmpute
Conclude: Interpret result of your test in the context of the problem

Chi squared df and expected counts

GOodness of fit:
Df= # of categories - 1
Expected counts: Sample size times the hypothesized proportion in each category
Homogeneity:
Df: (# of rows-1)(#colums-1)
Expected counts: (row total) (column total)/table total

P(atleast 1)

#NAME?

Residual

Actual minus Predicted

...

...