Name the two branches of Statistics
Descriptive
Inferential
Branch of statistics that organizes, summarizes, displaying data; anecdotes cannot be used.
Descriptive Statistics
Branch of Statistics that draws conclusions
Inferential
Objects described by the set of data. Can be people, animals, things.
Individuals
Characteristics of an individual; can take on different values for different individuals.
Variable
Is Individuals or Variable LESS intuitive?
Variable
Another name for Qualitative Variable
Categorical
(Think: Category)
Variable that places on individual into one or several groups or categories (i.e. small, midsize or large cars)
Qualitative/Categorical variable
Variable that have numerical values for which arithmetical operations make sense (i.e. average distance between stars). NOT only an average
Quantitative Variable
Other name for Median.
Center
Measure of Central Tendency
x with a line above it means:
Sample Mean
What is a "Mean
Add up the numbers and divide by sample size
Mo
MODE
Define 'Mode
Most frequent
Middle number
Median
What type of rounding is generally used?
Rount to one digit more than original data
What does an "outlier" do?
Skews data to the left or right
If outlier is to the left, what happens?
Pulls mean TOWARDS outlier
mean<median
If outlier is to the left, how is data skewed?
Skewed to the LEFT
Mean<Median....Where is outlier?
Skewed to the LEFT
If outlier is to the right, what happens?
Pulls mean TOWARDS outlier
Mean>Median
If outlier is to the right, how is data skewed?
Skewed to the right
Complete group of subjects of interest
(i.e. all persons in US)
Population
Number that describes a population characteristic
( of total number of persons in US)
Parameter
Part of population of Interest, a subcollection
(All women in US)
Sample
Describes a sample characteristic
Statistic
6 Steps to designing a study
1. IDENTIFY variables
2. DEVELOP plan to collect data
3. COLLECT data
4. DESCRIBE date with descriptive statistical techniques
5. INTERPRET data and MAKE Predictions
6. IDENTIFY possible errors.
1. Group within a group that shares similar characteristics
2. Random sample from each group
Stratified Sample
1. divide population by naturally occurring subgroup
2. Choose SRS (simple random sample) in each of the groups
3. Select a group and all members in it to study
Cluster Sample
1. Choose a random starting value
2. Then choose kth member to test
(i.e. every 5th bottle of Ragu)
Systematic Sample
Use only available members of a population
Convenience or Voluntary Sample
What kind of sampling invokes responses primarily from people who are really pleased or really NOT pleased?
Convenience or Voluntary Sample
4 types of Data Collection
Experiment
Simulation
Observational Study
Survey
(Extend an S.O.S)
Data Collection that does not change conditions
Observational Study
Data Collection where treatment is applied to a population and no treatment is applied to the control group. Placebos may be used
Experiment
Data Collection that makes use of reproducing conditions
Simulation
Data collection that investigates one or more characteristics
Survey
The source of differing effects in a study is unclear
Confounding Variable
Favorable reaction to placebos
Placebo effect
Subject does not know if treatment is given
Blinding
Experimenter nor subject know if treatment is received
Double-blinding
It is important to have members from each strata
Stratified sample
Sample from naturally occuring subgroups
Cluster Sample
4 Levels of Measurment
Nominal Level
Ordinal Level
Interval Level
Ratio Level
Measurement level that is categorical only (i.e. names, lables, qualities)
It is NOT put in any specific order
(i.e.SSN, Joursey #s, Phone numbers)
Nominal Level (this is Categorical/Qualitative)
Measurement level that is ordered or ranked
(i.e. Final scores of a tournament)
Ordinal Level
(this is Categorical or Quantitative)
Measurement level that can be ordered AND has a meaningful difference.
0 is a position, it does NOT mean 0.
Interval Level
Measurement level that can be ordered AND has a meaningful difference.
0 means 0, NOT a position
Ratio Level
Measurement that cannot be ordered
Nominal Level
List the ordered measurements:
Ordinal level
Interval Level
Ratio Level
Measurement level that can be ordered and subtracted:
Interval Level
Measurement level that can be ordered, subtracted and has a multiplyer
Ratio level
Parameter goes with______.
Population (describes the population)
Statistic goes with____.
Sample (describes the Sample)
What is an example of a bad sample group?
Voluntary / Convenience Sample
What does a bad sample create?
Bad data
Why does a bad sample (i.e. voluntary response) group produce bad data?
It is a biased group.
Why are voluntary groups used?
Convenience for the RESEARCHER.
What produces unrepresentive data and biased data?
Bad Samples
Biased Sample Groups
What is necessary for an experiment to be an experiment?
Treatment" is applied and observed
Treatment is applied and response is observed
Experiment
Observes and measures characteristics of interest.
Oberservational Study
Example of Observational Study
Surveys
Mathmatical or physical model to reproduce the conditions of a process.
Simulation
What is a good example of Simulation?
Weather Channel
3 Elements of Experimental Design?
Control
Randimization
Replication
What accounts for effects other than the one being meausred
Control
Cannot tell difference btw effects of different factors on a variable
Confounding variables
Subject reacts favorably to a placebo
Placebo effect
Process of assigning subjects to different treatment groups
Randomization
Subject is assigned to different treatment groups through random selection
Complete Randomization
1) Divide Subjects with similar characteristics into blocks
2) Within each block, randomly assign subjects to treatment groups
WHAT IS THIS CALLED?
Randomized Block Design
1) Subjects paired according to similarities
2) One person is randomly selected to receive one treatment while the other receives a different treatment
WHAT IS THIS CALLED?
Matched Pair Design
Repetition of an experiment using a large group?
Replication
EVERY possible sample of same size has same chance of being selected.
Simple Random Sample (SRS)
What to look for in Distribution of Quantitative Data?
Pattern
Deviations of Pattern
Shape of Data
Center of Data
Spread of Data
Outliers
Stem-Plots are used for what kind of variables?
Quantitative
What represents the "LEAF" in a Step-Plot?
One's place
Most common graph shape?
Normal (bell shaped curve)
Extreme values that fall outside overall pattern.
Outliers
What can outliers be a result of?
Natural occurance.
Recording error.
Measuring error.
Unit differences
Are Histograms Quantitative or Categorical?
Quantitative
How many values in a histogram?
One
Histograms divide values into ____ _____
Class Intervals
In Histograms, frequency is found on the ____ axis.
Y
In Histograms, variable is found on the _____ axis.
X
Histograms: How to calculate how many intervals?
n^2, round up Square sample size and round up.
How to determine WIDTH of intervals (class)?
(Max - Min) / # of intervals, round
# of times a # or range of #'s occurs.
Relative Frequency
Eq: Relative Frequency
Class Frequency/ Sample Size
f/n EXAMPLE: 32/90
Eq: MIDPOINT of class
(Lower class limit + Upper Class Limit) / 2
Time plots: X axis is for ____.
Time
Time Plots: Y axis is for ____.
Variable
Sample that is not representative of the population.
Biased Sample
Selecting or Encouraging one outcome or answer.
Systematic Error
What do systematic error lead to?
Overstating and Understating
Results that are higher than the population actually believes?
Overstate
Results that are LOWER than the population actually believes?
Understate
When the question limits the answer or when the question is ambiguous.
Unintentional Bias
Asks 2 Questions in one, resulting in a yes to one means yes to both. Does not allow for a yes and no answer.
Unnecessary Complexity (compound question)
Strong association btw questions leading to people under or over state.
Ordering of questions (leading questions)
Leading questions are asked in order to push the respondents towards a certain answer.
Push Pole
These 4 things affect the Center of Data:
Mean
Median
Mode
Sample Size
These 4 things affect the variation:
Range
Quartile (IQR- Inter Quartile Range)
Variance
Standard Deviation
Shape of:
Mean = Median = Mode
Normal Distribution
Shape of:
Mean > Median
Skewed to the Right
Shape of:
Mean < Median
Skewed to the Left
In a skewed graph, where is the Median found?
Between the Mode and Mean.
On a skewed graph, where is the Mode found?
Highest point.
On a skewed graph, where is the Mean?
Lowest point (on tail side)
If all values are the same, then they all equal the mean.
There is no variability
When some values are different (above/below) the mean.
Variability
Range
Max - Min
Will an outlier change the range?
Yes.
Is Range resistant to outliers?
No.
What is the total area of the density curve? (purple page)
1
What does the median divide in half?
Area (1)
At what point does any curve balance?
Mean
What symbol is used for population mean?
mu (u)
What symbol is used for sample mean?
x-bar
Give 3 examples of things that are normally distributed.
Children's heights
Arrival times to class
Manufactured things
Cat sizes
Test scores
Car weight
Size of things that grow.
Give examples of things that are not normally distributed
Leaving a theater
Leaving a subway
Test Scores for class that has mostly mastered the material
What symbol is used for standard deviation?
o with a line
What three percentages are important with the normal curve? What is it called?
68% - 95% - 99.7%
Emperical Rule
What are the percentages within the Area of a normal curve for each standard deviation starting at the Mean?
34%
13.5%
2.35%
.15%
What percent entails 1 standard deviation?
68%
What percent entails 2 standard deviations?
95%
What percent entails 3 standard deviations?
99.7%
What percent entails 4 standard deviations?
100%
Why is it important to standardize a normal distribution, or, how would it help?
To be able to compare observations from different normal distributions
Formula that is used to standardize a normal distribution? What is the result called?
Z = x - u
-------
o-
Z-score
What does the "x" represent in the Z-score formula?
The variable for which the z-score is being sought (such as a test grade)
The shape of the graph representing the distribution of annual incomes for full-time American workers would be:
Skewed to the Right
The shape of a graph representing the distribution of the percentage of senior citizens over the decades would be:
Skewed to the Left
Regarding Gas, Price and Gallons, which is the variable and which is the count/frequency?
Variable: Price
Count/Frequency: # of Gallons
Typical deviation from mean?
Standard Deviation
Small values of Standard Deviation indicate _____ variability in the data.
Small
Is Mean resistant to outliers?
No
What is Sample Standard Deviation in TI?
Sx
How to get sample variance out of Sx?
S^2
Are the mean and Standard Deviation resistant to outliers?
No
Inter Quartile Range (IQR)
Q3 - Q1
How should IQR be written out if outliers are present or skewed distribution?
Min - Q1 - Median - Q3 - Max
When can Mean and Standard Deviation be used?
Normal Distributions
Uniform Distributions
What is a Uniform Distribution
All values remain the same (straight line/rectangle shape graph)
What measures the spread about the mean and should only be used when the mean is choosen as the measure of center?
s
(Sample Standard Deviation)
S is always ______ or s = ___.
Positive
0
The more the Standard Deviation is spread out the ______the s.
larger
S has the ______ units as the original data.
same
The closer data is to the mean, the _____ the s.
smaller
The further the data is from the mean, the ____ the s.
larger
Density curve: is always ____ or _____ the x-axis
on
above
Density cure: the area below the line = exactly ___.
1
What does a Density Curve do?
Describes overall pattern of distribution.
Variable
Characteristics of Individuals
2 Types of Variables:
Quantitative
Categorical (qualitative)
3 kinds of Histograms:
Normal
Skewed to Left
Skewed to Right
How many variables in a histogram?
1
Are Histograms quantitative or categorical?
Quantitative
S^2 = ____
Square root of Sx
For a Normal Distribution, what is used to find spread?
Mean
Deviation (o- or s)
In a normal distribution, how do you write that the mean is 64 and the 0- is 2.7 ?
N (64,2.7)
N( mean, o- )
# of standard deviations that a data value, x, falls from the mean (u) is called a ____. Give formula
Z - score
Z= x-u / o-
A standard normal curve will have ____ as it's mean. Give an example of how this is written using N ( , ) with a normal curve and 1 = o- .
0
N (0,1)
Z-scores are used to compare _____.
Normal Distributions
Z- scores are what kind of #'s?
+ or -
What kind of distributions are Z-scores used with?
ONLY Normal Distributions
What is considered a rare or unusual Z-score?
Z < -2 or 2< Z
Is there a Q4?
No
Percentiles, how many are there?
99
What are the equivelant %iles for Q1, Q2, Q3?
Q1 = P25
Q2 = P50
Q3 = P75
How to find %ile of x?
(# values of < x / total # values ) x 100
Equation to find %ile of 20 if total 30 values?
(19/30) x 100 = 63 rd %ile
If the median of a boxplot is near the center of the box and each whisker is approximately equal length, the distribution is roughly ____.
Symmetrical
If the median of a boxplot is to the left of the center of the box or the right whisker is substantially longer than the lrft, the distribution is skewed _____.
Right
If the median of a boxplot is to the right of the center of the box or the left whisker is substantially longer than the right whisker, the distribution is skewed ______.
Left
The highest point in the histogram or density curve is the ____.
Mode
The _____ is pulled toward the tail.
Mean
The ______ is between the ____ and the _____.
Median
Mode
Mean
Tell how to find a weighted mean.
Figure each individual mean.
Add all means together
Divide by total number of means.
How to calculate the median of a stemplot?
Upper Value + Lower Value
____________________
2
Is the Median Resistant to Outliers?
Yes (# of values doesn't change)
Example of a Probability experiment?
Rolling die
What would {3} be in a rolling die experiment?
Outcome
What would {1,2,3,4,5,6} be in a rolling die probability experiment?
Sample Space
What would the fact that die rolled is even : {2,4,6,}
Event
Event with 1 outcome.
Simple Event
Fundamental Counting Principal
Event 1 occurs in "m" ways.
Event 2 occurs in "n" ways.
Therefore, # of ways that 2 events can occur =
M(N)= Total Ways
How many phone number combinations are there for a 10 didget number.
8 *10*10*10*10*10*10*10
10
10
0*10*10*10*10*10*10
10
10
0*10*10*10*10*10*10*10*10
(8 is used because first number cannot be a 1 nor a 0)
8 million combinations
If last didget of a phone number must be even, how many options would there be?
5
(0,2,4,6,8)
If last didget of a phone number must be odd, how many options would there be?
5
(1,3,5,7,9)
How to calculate probability (how likely something will happen)?
P(E) = # of outcomes in an event
--------------------------------
# of total outcomes in Sample
Give an example of how to calculate probability.
P(E) = Made basket 5 times
-------------------------
Shot basket 10 times
Face cards are:
J, Q, K
Ace counts as a ___.
1
How many cards in a deck?
52
Probability that you will pick a face card out of a deck?
12/52 = .23
Formula for frequency
F/N
How often something happens
------------------------------------
# of draws
Emperical Probabilities Of Serious Traffic Problems:
Serious Problem: 123
Moderate Problem: 115
Not a Problem: 82
------
320
P(serious problem): /
P(moderate problem): /
P(problem at all): /
123/320 = .38
115/320 = .36
(123 + 115) / 320 = .74
Probability numbers can be ___ <= ____ <= _____
0 <= P(E) <= 1
(0-1 can be probability numbers, including % for 0-1)
On a graph, how do you figure probability for bar "f"?
Add up frequency for each bar/point.
Divide frequency of F by the total Frequency of all bars.
Box plot: % of each whisker?
25%
Box plot: % of entire box?
50%
Box Plot: Middle of box to any edge?
25%
Box Plot: Middle of Box to end of any given whisker?
50%
Box plots Quartiles are divided by _____, not ______.
% not length
What is the probability that a person is between Q3 and the far right whisker?
25%