Stats Final

Population (1.1)

the entire group of individuals about which we want information in a statistical study.

Sample (1.1)

a part of the population from which we actually collect information, which is then used to draw conclusions about the whole.

Experiment (1.1)

deliberately imposes some treatment on individuals in order to observe their responses. The purpose of it is to study whether the treatment causes a change in the response.

Observational Study (1.1)

considers individuals and measures variables of interest but does not attempt to influence the responses. The purpose of it is to describe some group or situation.

Individuals (1.1)

are the objects described by a set of data. They may be people, animals, or things.

Variable (1.1)

is any characteristic of an individual. It can take different values for different individuals.

What are the steps of the statistical problem solving process? (1.1) (4)

I. Ask a question of interest. II. Produce data. III. Analyze data. IV. Interpret results.

Counts (2.1)

numbers

Rates(2.1)

percents or proportions

What is more useful/helpful in an experiment? Counts or Rates (and why)? (2.1)

Rates are often clearer than counts—it is more helpful to hear that 12.9% of young adults did not finish high school than to hear that there are 5,126,000 such people.

Distribution (of a variable) (2.1)

It tells us what values the variable takes and how often it takes these values.

When to use a pie chart? (2.1)

It would be correct when categories represent distinct parts making up a whole.

Categorical Variable(2.1)

It places an individual into one of several groups or categories.

Quantitative Variable (2.1)

It takes numerical values for which arithmetic operations such as adding and averaging make sense.

Outlier (2.1.2)

It can exist in any graph of data is an individual observation that falls outside the overall pattern of the graph.

Spread/IQR (interquartile range) (2.1.2)

minimum and maximum of a data set not including outliers. subtract Q1 from Q3. small spread implies small SDlarge spread implies large SD

Center(2.1.2) Median (2.2.1)

the midpoint of the data set, often noted as 'M'

Shape(2.1.2)

number of peaks in a data set (stemplot)

Skewed right (2.1.3)

the mean the data is to the right of the peak/the 'long tail' is to the right of the peakimplies mean > median

skewed left (2.1.3)

the mean the data is to the left of the peak/the 'long tail' is to the left of the peakimplies mean < median

no skew/symmetric (2.1.3)

if the right and left sides of the graph are approximately mirror images of each other.implies that the mean is close to the median

reasons why dotplot and stemplot are more helpful than a histogram (2.1.3)

to display the distribution of quantitative variables that have relatively few values. (distribution by age group in a city)

reasons why a histogram is more helpful than a dotplot and stemplot (2.1.3)

so many values of a quantitative variable exist that a graph of the distribution is clearer if nearby values are grouped together (SAT scores, family income)

to find flaws in histogram (2.1.3)

make your own and check accuracy (class intervals, shape, spread)

how to make a histogram (2.1.3)

STAT, press ENTER to highlight option 1 EDIT, input L1, 2nd Y= (STAT PLOT), choose option 1, press ENTER, scroll to turn plot 1 ON, scroll to select type as histogram, ZOOM, scroll to number 9 ZoomStat, ENTER, press TRACE if intervals are needed

how to find a median (2.2.1)

put the data in numerical order, if the number of observations (n) is odd the middle number is your median. if, however, n is even, average the two middle numbers.

quartile (2.2.2)

an easy way to understand the spread of information. lower quartile is the group that includes the first quarter of the data, upper quartile is the 3/4 mark of the data.

how to calculate q1 and q3 (2.2.2)3 steps

find the median of the entire data set, find the median of the first half of the data set (Q1), find the median of the second half of the data set (Q3)

five number summary (2.2.3)

Minimum Q1 M(edian) Q3 Maximum

boxplot (2.2.3)

is a graph of the five-number summaryA central box is drawn from the first quartile (Q1) to the third quartile (Q3).A line in the box marks the median.Lines extend from the box out to the smallest and largest observations that are not outliers.

outliers (2.2.3)

Any observation that is more than 1.5 times the interquartile range (IQR) above Q3 or below Q1 is considered an outlier.

outlier formula (2.2.3)

Q3 + 1.5(IQR)= outlier to the right of the median Q1-1.5(IQOR)= outlier to the left of the median

standard deviation (s) (2.2.5)

measures the average distance of the observations from their mean

how to calculate standard deviation/ any statistical data that contributes (2.2.5)

Define L1 (STAT, EDIT)Press STAT, arrow-right to CALC, and choose 1:1-Var StatsComplete the command 1-Var Stats L1 and press ENTER. (Press to get L1.)Notice the down-arrow on the left side of the display. Press the down-arrow key several times to see more.

standardized value/ z score purpose(3.1.2)

to an observation (x) relevance to the mean and the complete data set of observations within a group

how to find standardized value/ z score (3.1.2)

z= (x-mean)/standard deviation let X=observation point z=(x-µ)/δ

normal curves (3.2.1)

symmetric, bell-shaped curves that describe Normal distributions. • is completely described by giving its mean and its standard deviation.• The mean determines the center of the distribution. It is located at the center of symmetry of the curve. • The standard deviation determines the shape of the curve. It is the distance from the mean to the change-of-curvature points on either side.-made of one peak-total area always equals 1

normal distribution (3.2.1)

Distribution described by a special family of bell-shaped, symmetric density curves, called Normal curves. The mean μ and standard deviation σ completely specify a Normal distribution N(μ, σ).-mean=median

point of curvature (3.2.1)

one SD value from the mean in both directions marks the location of the straightest point on the curve

difference between median and mean of a density curve (3.1.4)

one divides the area in half, the other marks the peak of the curve

The 68-95-99.7 rule (3.2.2)

In any Normal distribution, approximately68% of the observations fall within one standard deviation of the mean95% of the observations fall within two standard deviations of the mean99.7% of the observations fall within three standard deviations of the mean

scatterplot (4.1)

shows the relationship between two quantitative variables measured on the same individuals.

response variable (4.1)

measures an outcome or result of a study.Should be on the Y axis.

explanatory variable (4.1)

explains or causes changes in the response variable.Should be the X axis.

What to look for in a scatterplot (4.1)

overall patterndeviations outlierdirection, form, strength

correlation r (4.1.3)

measures the direction and strength of a straight-line relationship between two quantitative variables. It's a number between -1 and 1. The sign of r shows whether the association is positive or negative. The value of r gets closer to −1 or 1 as the points cluster more tightly about a straight line. The extreme values −1 and 1 occur only when the scatterplot shows a perfectly straight line.

when you feel baffled by a word problem,

first make sense of the givens (given values that might fit into a formula/graph)follow the directions, checking in with the givens as you proceeddo NOT move forward unless you are clear with your present step

regression line (4.2.1)

a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use one to predict the value of y for a given value of x.line of best fit

least squares regression line (4.2.2)

the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

formula for least squares regression line on a given slope (4.2.2)

y = a + bxb is the slopea is the y intercept

how to calculate the linear regression of a scatterplot (4.2.5)

input L1 for X data points and L2 for Y data points STATCALCOPTION 4 (LinReg), note a and b to calculate the least squares regression at any given point

residual (4.2.5)

the difference between an observed value of the response variable and the value predicted by the regression line.

residual formula (4.2.5)

0

r2 in regression (4.2.6)

square of the correlation (given) that demonstrates how many data points fell on the least squares line

how to make residual plot on calc (4.2.7)

2nd Y=change Y list from L2 to Resid, press 2nd STAT and scroll down to choose Resid from the menuENTERZoom 9 (ZoomStat) to display graph

residual plot (4.2.7)

graph that uses the given x value, and residuals as the Y value, to visually demonstrate accuracy of least squares line

Lurking variable (4.2)

a variable that has an important effect on the relationship among the variables in a study but is not one of the explanatory variables studied.

what are the elements of biased sampling methods (5.1)

voluntary response sample: convenience sampling.

voluntary response sample (5.1)

chooses itself by responding to a general appeal. Write-in or call-in opinion polls are examples of voluntary response samples.

convenience sampling (5.1)

Selection of whichever individuals are easiest to reach

simple random sample (SRS) (5.1.2)

a sample of a set number of people that reflects that proportion of that number of people- black people in brentwood: won't be a simple random sample bc you have a great chance of sampling the same people multiple times

parameter (5.2.1)

is a number that describes the population. it is a fixed number, but in practice we don't know its value.isn't always a number, does describe the statistic

statistic (5.2.1)

is a number that describes a sample. The value of a it is known when we have taken a sample, but it can change from sample to sample. We often use one to estimate an unknown parameter.

bias (5.2.2)

is consistent, repeated deviation of the sample statistic from the population parameter in the same direction when we take many samples.

variability (5.2.2)

describes how spread out the values of the sample statistic are when we take many samples. Large variability means that the result of sampling is not repeatable.

what are the two types of errors in estimation (5.2.2)

biasvariability

2 parts of a confidence statement (5.2.4)

margin of error and a level of confidence-the conclusion of the statement always describes the population

margin of error (5.2.4)

says how close the sample statistic lies to the population parameter

level of confidence (5.2.4)

says what percent of all possible samples satisfy the margin of error.

quick method to calculate margin of error (5.2.3)

1/ the square root of the sample

formula for percentage change

work out the difference (increase) between the two numbers you are comparing. Then: divide the increase by the original number and multiply the answer by 100