Upper Fence Formula
UF = Q3+1.5(IQR)
Lower Fence Formula
LF = Q1-1.5(IQR)
Relative Frequency
The counts turned into percentages
What will be the shape of the histogram when the mean is bigger than the median?
right skew
What will be the shape of the histogram when the mean is smaller than the median?
left skew
Z-Score Equation
Z=(x-�)/s
What happens to the mean and standard deviation when adding or subtracting a constant amount?
The mean changes by the same constant that is being added or subtracted, but the standard deviation stays the same
What happens to the mean and standard deviation when multiplying or dividing a constant amount?
Both the mean and standard deviation are multiplied/divided by the same constant
What is a percentile?
A number that indicates what percentage of data is at or below a given value. It compares one data point to the rest.
What is the 68-95-99.7 Rule?
68% of the values fall within one standard deviation of the data, 95% of the values fall within two standard deviations of the data, and 99.7% of the values fall within three standard deviations of the data.
When the mean and median are the same
Symmetrical
Categorical
a variable is categorical if the responses fall into categories, such as eye color
Quantitative
a variable is quantitative if the variable is numerical, such as height, and usually includes units to tell how it is measured
Categorical data can be presented in a..
1.Pie Chart
2. Bar Chart
3. Side-by-Side Bar Chart
4. Segmented Bar Chart
Quantitative data can be presented in a..
1. Boxplot
2. Histogram
3. Dot plot
4. Stem-and-Leaf Plot
Who?
Who was measured?
Case- each individual being measured (row)
Observation- data values recorded about each individual (actual number)
What?
What was measured?
Variables- characteristics recorded about each individual, it may change from one individual to another (categorical vs. quantitative)
Where?
Where was the study collected? The place where the data was collected
When?
When was the study performed?
Why?
Why was the study performed?
How?
How was the data collected?
Ex. sample surveys, observational studies, experiments
How to draw a "quick" boxplot
1. Draw a scale number line.
2. Draw rectangular box with ends at quartiles.
3. Draw line through box at median.
4. Draw two "whiskers" from corresponding ends of box to extreme values (min and max).
5. Find the fences to see if there are outliers
What is a histogram?
Displays a group of data at a glance through counts.
What is a relative frequency histogram?
Shows a group of data/information through percentages.
When describing a distribution, what four things should you always mention?
SOCS: Shape, outliers, center, and spread.
When describing the shape of a distribution what should you look for?
1. Mode (unimodal, bimodal, or multimodal).
2. If it is symmetric.
3. If there are any outliers.
What is the center of distribution?
The typical value that occurs, with a symmetric/unimodal histogram, it's directly in the center.
What is the spread of distribution?
It answers the question, "how much the data values vary around the center?", it is used to describe a distribution numerically. It can also be measured by range or IQR.
When is it appropriate to use a time plot?
When there is data measured over time, and you are looking for pattern.
What is re-expressing or transforming data?
When you apply a simple function to make skewed distribution more symmetric, it is to make information easier to understand, and to find the center.
Why is IQR a better indication of spread than range?
Range can be skewed by outliers, but since IQR only accounts for the middle 50%, it isn't skewed.
What does standard deviation demonstrate?
It is a measurement that is used to describe how far a set of values are from the mean.
How are stem and leaf plots and histograms similar?
They show individual COUNTS.
What is the formula used to to find the value of median in a data set?
(n+1)/2=th count (count this many terms into the data in order to find the actual VALUE of the median)
How does standardizing a variable affect the shape, center, and spread of its distribution?
Shape: does not change
Center: changes the the mean now equals 0
Spread: standard deviation now equals 1, so it shifts
What does a box plot show?
Minimum, Q1, Median, Q3, Maximum.
5 NUMBER SUMMARY!
What are disadvantages of box plots?
You can't see the shape in a box plot.
Exact values are not recorded in a box plot
What is SOCS?
Shape, Outliers, Center, Spread
Always define SOCS when writing sentences about graphs
What are advantages of box plots?
Clear picture of middle half of data
box height= IQR
Clear display of outliers
5-number summary is...
standard deviation, min, Q1, med, Q3, max
conditions of a normal distribution
#NAME?
How do you find the percentile of a value?
([Number of score lower + 0.5] � Total Number of scores) x 100
IQR is...
Q3-Q1
How many steps must you have when normalizing?
4
What does normalizing do?
Normalizing makes the data comparable to each other through a constant unit
How do you normalize?
Z-score
What is a z-score?
It tells you how far away a value is from the mean
How do you make a list on your calculator?
STAT, edit, enter
How do you find the mean, median, min, max, Q's, and Standard deviation on your calculator?
STAT, CALC, 1-Var Stats, (Pick List), ENTER
How do you make a dot plot?
Draw a horizontal line and mark it with an appropriate measurement scale. Locate each value in the data set along the measurement scale, and represent it by a dot.
What does truncating mean?
Truncating is making a value smaller/less complicated. ex: 3.09----->3.0
Standardizing
converting scores from their original values to standard deviation units
-a standardized value is AKA z score
Marginal Distribution
In the contingency table, the distribution of either variable alone
Conditional Distribution
The distribution of a variable restricting the Who to consider only a smaller group of individuals
Variance
sum of squared deviations from the mean, divided by the count minus one
Parameter
Numerically valued attribute of a model
Statistics
Value calculated from data to summarize the aspects of the data
Contingency Table
displays counts and,sometimes, percentages of individuals falling into named categories on two or more variable.
Independence
the conditional distribution for one variable is the same for each category of the other
Sample Mean
standard deviation divided by the square root of n
Skewed Right
Distributions with fewer observations on the right (toward higher values) are said to be skewed right
Skewed Left
Distributions with fewer observations on the left (toward lower values) are said to be skewed left.
Zip codes
Categorical
How do you make quantitative data categorical?
By grouping it
Where do the first standard deviations on a normal curve go?
At the inflection points
What are inflection points?
Where the graph shifts concavity
If the data is skewed right, the mean will be ________ than the median.
Greater
4 steps to do a normal distribution problem
1. State problem
2. Draw/shade a normal curve
3. Solve- use Table A
4. conclude- answer in the context of the problem
Segmented Bar Chart
displays the conditional distribution of a categorical variable within each category of another variable (always adds up to 100%)
Independence
Variables are independent if the conditional distribution of one variable is the same for each category of the other
Uniform
A distribution that is relatively flat
Outliers
Extreme values that do not appear to belong with the rest of the data
A stem and leaf always needs a _____ with it
key/legend
When is it appropriate to use a pie chart?
1. When the data adds to 100
2. When there are 5 categories or less
3. When data can apply to one category only
When is it more appropriate to use a histogram rather than a stem-and-leaf display
When you have a lot of values in your data
What does sigma represent?
The standard deviation
What does Mu (?) represent?
The mean
Where are on the normal curve are inflection points located?
Where the bell shape changes from curving downward to curving back up.
Three methods for assessing whether or not a distribution is approximately normal:
1. Probability Plot
2. Histogram/Stem and leaf plot
3. 68-95-99.7 rule
Why are standardized units used to compare values with different scales, units or populations?
Because standardized values have no units - it simply measures the distance of data from the mean in standard deviations
What is the relationship between variance and standard deviation?
Variance is the sum of squared deviations from the mean while standard deviation is the square root of the variance.
Equation for finding the count?
(n+1)/2