AP Statistics Terms and Interpretations

Categorical or Quantitative

Categorical - a variable that places individuals in a group/category (green eyes)
Quantitative - a variable that is a numerical values that make sense to average (test scores)

describing and comparing distribution(s) - SOCS

One distribution - Shape, outliers-unusal features, center, spread - context.
If comparing two or more distributions - same as one, except use comparision words!

Z-Score

z= (observation - mean)/sd
Used to standardize a score
Used to compare different individuals relative to each other (a man's relative height to a women's relative height)

Mean/SD OR Median/IQR

Use mean/sd to describe when the data set is aproximately normal or roughly symmetrical
Use median/IQR to describe when the data set is skewed. (These values are less affected by outliers - resistant to outliers)

standard deviation and variance

SD - The typical distance of each observation from the mean.
Variance is the standard deviation squared.

Five Number Summary and Outlier Rule

Min, Q1, Median, Q3, Max
1.5 Outlier Rule-
IQR = Q3-Q1
Q3 + (1.5xIQR) or
Q1-(1.5xIQR)

DOFS - describing a scatterplot

Direction - positive, negative
Outliers - high leverage and influential also, include unusual features
Form - linear or not
Strength - strong, moderately strong, weak, moderately weak

Parameter and statistics

Parameters come from populations
Statistics come from samples

Census versus Sample

Census - attempt to collect data from every individual in the population
Sample - a subset of the population

Good Sampling

SRS, stratified, cluster, systematic

Voluntary, convenience

Bias - explain bias and in which direction it would bias the study

Undercoverage, response, nonresponse, question wording

Empirical Rule

68-95-99.7
Used to estimate areas in a Normal distribution

Describing random sampling and random assignment

Be specific!
Explain how you are using a chance process
What are you doing with repeats

Experimental Designs

Completely randomized
Randomized block
Matched pairs (type of block - blocks of 2)

Good Experimental Design Components

Random - random assignment
Control (not same as control group) - trying to limit other variables (confounding) that might affect outcome
Replication - using enough subjects or experimental units
Comparison - compare two or more treatments

Simulation - SPDC

A way to imitate chance behavior.
If asked to set one up - SPDC
State - question of interest
Plan - describe how to use chance to imitate one repitition - explain thoroughly and tell what you will record
Do - perform many repititions
Conclude - use simula

Law of large numbers

If we perform many, many repititions of a chance outcome the value will approach a single number.
Example, if we roll a die, many, many times, our probability of getting a 2 is 1/6. I may not get that in 10 rolls, 20 rolls, etc, but if I did it many, many

independent events

Two events that have no effect on each other.
P(N|W)=P(N)
If these two probablities are equal, then the two events, N and W, are independent
The fact that W occurred has no effect on the probability of N.

mutually exclusive events

Two events that cannot occur at the same time. (Male and Pregnant)

Binomial and Geometric RV

B - success/failure
I - independent observations
N - fixed number of trials (binomial); continue until you get one success (geometric)
S - same probablity of succes each trial

Scope of Inference

If random sampling then we can make inferences about the population from which we sampled.
If random assignment occurs we can make cause and effect inferences, for those similar to the ones in the study.

Sample distribution vs Sampling distribution vs. Population Distribution

Sample - Data for one sample (10 red chips and 10 blue chips from 1 sample of 20 chips)
Sampling - One dot represents the proportion of red chips (successes). So one dot on the plot would represent all 20 chips, the proportion of red. You would need many,

Sampling Distributions

If the scenario is a sampling distribution don't forget to use the formula to find the standard deviation/standard error before using normalcdf.
Don't use the standard deviation for original distribution. Remember the sampling distribution has lower varia

Large Counts for Normal condition

Only used in proportions

CLT - Central Limit Theorem for Normal condition

Only used in means

Use Confidence Interval

when estimating a value

Point estimator/Point estimate

estimator - A statistic that estimates a population parameter.
Proportions and means are good unbiased estimators.
estimate - is the value from your sample.
The confidence interval is the point estimate +/- margin of error.

SPDC - confidence intervals

STATE- What are you estimating, define parameter and CI level
PLAN - name and conditions
DO- calculuator name and CI
CONCLUDE - We are ____% confident.........CONTEXT

Use Hypothesis/Significance Test

when testing a claim, evaluating evidence, asked if we have convincing evidence

Hypotheses and Conclusions are always talking about ______

parameters

SPDC - hypotheses(significance) test

STATE - Null and Alternative Hypotheses (in terms of parameter) define parameter, and alpha level
PLAN - name and conditions
DO- calculator name, test statistic, p-value and df(means)
CONCLUDE - Since our p-value......CONTEXT

test statistic

The number of standard deviations the sample statistic lies away from a hypothesized population parameter.
z for proportions, t for means

Pooled proportion

Use in a 2-Sample Z test (not a CI) for the difference of proportions in the Normal/Large Counts condition and test statistic (found on calculutor)
Never Pool for means

Statistically significant

When the p-value is smaller than alpha we say the results are statistically significant at the alpha level we used.
Meaning we do have evidence for the alternative, so we reject the null.

power

The probability of rejecting the null given that the alternative is true (Power + Beta = 1)

Type 1 Error

Rejecting the null when null was true (alpha)

Type II Error

Failing to reject null when alternative was true (beta)
(Power + Beta = 1)

Residuals

R = A(actual) - P(predicted)
The difference between the actual data value and the prediction line.
If the data value is above the prediction line, the residual is positive.
If the data value is below the prediction line, the residual is negative.

Scatterplots - prediction line

y-hat (response) = intercept - slope(x, explanatory)
Always define variables - better yet write within equation. Such as (price of truck)-hat = intercept - slope(miles on truck)

Unusual points in scatterplots

Outlier (high residual)
High Leverage (x-value is outside of the bulk of the data)
Influential (changes slope, intercept, correlation values)

correlation coefficient(r)

The strength and direction of a linear association between two quantitative variables that are related.
Does not tell if a line is a good fit, only tells strength and direction once you've determined a line is a good fit. Use scatterplot and residual plot

Correlation/Association is NOT Causation

You need an experiment with random assignment to make inferences about cause and effect

coefficient of determination (r^2) - interpretation

Approximately ______% of the variation in the y (response) can be explained by the linear relationship, or LSRL, with x (explanatory).

confidence interval interpretation

We are ____% confident the interval between _____ and ____ captures the true ____________________________.

confidence level interpretation

If the study was repeated many times, about ____% of the resulting confidence intervals would contain the true population ______________________________________(Context)

p-value interpretation

Assuming the null is true (in context), there is a _______probability(sample data) of seeing a sample value, or more extreme, just by chance .

Y-Intercept (a) interpretation

The PREDICTED value of the y (response) variable when the x (explanatory) variable is 0.
Sometimes this value makes sense in context and sometimes it doesn't - we always interpret the same way though.

Slope (b) interpretation

The amount which the y (response) is PREDICTED to change when x increases by 1 unit.

Z-Score Interpretation

The number of standard deviations an individual is above or below the mean

s - on computer output for prediction lines and Interpretation

Standard deviation of the residuals
Interpretation - measures the typical size of the residuals (prediction errors) when using the LSRL

Percentile and interpretation

Describe the location of an individual within a distribution.
Your percentile is the percent of observations that are below or equal to your value.