Statistics STA 220 - Test 1 (CHP. 1,2,3,5)

Name the two branches of Statistics

Descriptive
Inferential

Branch of statistics that organizes, summarizes, displaying data; anecdotes cannot be used.

Descriptive Statistics

Branch of Statistics that draws conclusions

Inferential

Objects described by the set of data. Can be people, animals, things.

Individuals

Characteristics of an individual; can take on different values for different individuals.

Variable

Is Individuals or Variable LESS intuitive?

Variable

Another name for Qualitative Variable

Categorical
(Think: Category)

Variable that places on individual into one or several groups or categories (i.e. small, midsize or large cars)

Qualitative/Categorical variable

Variable that have numerical values for which arithmetical operations make sense (i.e. average distance between stars). NOT only an average

Quantitative Variable

Other name for Median.

Center
Measure of Central Tendency

x with a line above it means:

Sample Mean

What is a "Mean

Add up the numbers and divide by sample size

Mo

MODE

Define 'Mode

Most frequent

Middle number

Median

What type of rounding is generally used?

Rount to one digit more than original data

What does an "outlier" do?

Skews data to the left or right

If outlier is to the left, what happens?

Pulls mean TOWARDS outlier
mean<median

If outlier is to the left, how is data skewed?

Skewed to the LEFT

Mean<Median....Where is outlier?

Skewed to the LEFT

If outlier is to the right, what happens?

Pulls mean TOWARDS outlier
Mean>Median

If outlier is to the right, how is data skewed?

Skewed to the right

Complete group of subjects of interest
(i.e. all persons in US)

Population

Number that describes a population characteristic
( of total number of persons in US)

Parameter

Part of population of Interest, a subcollection
(All women in US)

Sample

Describes a sample characteristic

Statistic

6 Steps to designing a study

1. IDENTIFY variables
2. DEVELOP plan to collect data
3. COLLECT data
4. DESCRIBE date with descriptive statistical techniques
5. INTERPRET data and MAKE Predictions
6. IDENTIFY possible errors.

1. Group within a group that shares similar characteristics
2. Random sample from each group

Stratified Sample

1. divide population by naturally occurring subgroup
2. Choose SRS (simple random sample) in each of the groups
3. Select a group and all members in it to study

Cluster Sample

1. Choose a random starting value
2. Then choose kth member to test
(i.e. every 5th bottle of Ragu)

Systematic Sample

Use only available members of a population

Convenience or Voluntary Sample

What kind of sampling invokes responses primarily from people who are really pleased or really NOT pleased?

Convenience or Voluntary Sample

4 types of Data Collection

Experiment
Simulation
Observational Study
Survey
(Extend an S.O.S)

Data Collection that does not change conditions

Observational Study

Data Collection where treatment is applied to a population and no treatment is applied to the control group. Placebos may be used

Experiment

Data Collection that makes use of reproducing conditions

Simulation

Data collection that investigates one or more characteristics

Survey

The source of differing effects in a study is unclear

Confounding Variable

Favorable reaction to placebos

Placebo effect

Subject does not know if treatment is given

Blinding

Experimenter nor subject know if treatment is received

Double-blinding

It is important to have members from each strata

Stratified sample

Sample from naturally occuring subgroups

Cluster Sample

4 Levels of Measurment

Nominal Level
Ordinal Level
Interval Level
Ratio Level

Measurement level that is categorical only (i.e. names, lables, qualities)
It is NOT put in any specific order
(i.e.SSN, Joursey #s, Phone numbers)

Nominal Level (this is Categorical/Qualitative)

Measurement level that is ordered or ranked
(i.e. Final scores of a tournament)

Ordinal Level
(this is Categorical or Quantitative)

Measurement level that can be ordered AND has a meaningful difference.
0 is a position, it does NOT mean 0.

Interval Level

Measurement level that can be ordered AND has a meaningful difference.
0 means 0, NOT a position

Ratio Level

Measurement that cannot be ordered

Nominal Level

List the ordered measurements:

Ordinal level
Interval Level
Ratio Level

Measurement level that can be ordered and subtracted:

Interval Level

Measurement level that can be ordered, subtracted and has a multiplyer

Ratio level

Parameter goes with______.

Population (describes the population)

Statistic goes with____.

Sample (describes the Sample)

What is an example of a bad sample group?

Voluntary / Convenience Sample

What does a bad sample create?

Bad data

Why does a bad sample (i.e. voluntary response) group produce bad data?

It is a biased group.

Why are voluntary groups used?

Convenience for the RESEARCHER.

What produces unrepresentive data and biased data?

Bad Samples
Biased Sample Groups

What is necessary for an experiment to be an experiment?

Treatment" is applied and observed

Treatment is applied and response is observed

Experiment

Observes and measures characteristics of interest.

Oberservational Study

Example of Observational Study

Surveys

Mathmatical or physical model to reproduce the conditions of a process.

Simulation

What is a good example of Simulation?

Weather Channel

3 Elements of Experimental Design?

Control
Randimization
Replication

What accounts for effects other than the one being meausred

Control

Cannot tell difference btw effects of different factors on a variable

Confounding variables

Subject reacts favorably to a placebo

Placebo effect

Process of assigning subjects to different treatment groups

Randomization

Subject is assigned to different treatment groups through random selection

Complete Randomization

1) Divide Subjects with similar characteristics into blocks
2) Within each block, randomly assign subjects to treatment groups
WHAT IS THIS CALLED?

Randomized Block Design

1) Subjects paired according to similarities
2) One person is randomly selected to receive one treatment while the other receives a different treatment
WHAT IS THIS CALLED?

Matched Pair Design

Repetition of an experiment using a large group?

Replication

EVERY possible sample of same size has same chance of being selected.

Simple Random Sample (SRS)

What to look for in Distribution of Quantitative Data?

Pattern
Deviations of Pattern
Shape of Data
Center of Data
Spread of Data
Outliers

Stem-Plots are used for what kind of variables?

Quantitative

What represents the "LEAF" in a Step-Plot?

One's place

Most common graph shape?

Normal (bell shaped curve)

Extreme values that fall outside overall pattern.

Outliers

What can outliers be a result of?

Natural occurance.
Recording error.
Measuring error.
Unit differences

Are Histograms Quantitative or Categorical?

Quantitative

How many values in a histogram?

One

Histograms divide values into ____ _____

Class Intervals

In Histograms, frequency is found on the ____ axis.

Y

In Histograms, variable is found on the _____ axis.

X

Histograms: How to calculate how many intervals?

n^2, round up Square sample size and round up.

How to determine WIDTH of intervals (class)?

(Max - Min) / # of intervals, round

# of times a # or range of #'s occurs.

Relative Frequency

Eq: Relative Frequency

Class Frequency/ Sample Size
f/n EXAMPLE: 32/90

Eq: MIDPOINT of class

(Lower class limit + Upper Class Limit) / 2

Time plots: X axis is for ____.

Time

Time Plots: Y axis is for ____.

Variable

Sample that is not representative of the population.

Biased Sample

Selecting or Encouraging one outcome or answer.

Systematic Error

What do systematic error lead to?

Overstating and Understating

Results that are higher than the population actually believes?

Overstate

Results that are LOWER than the population actually believes?

Understate

When the question limits the answer or when the question is ambiguous.

Unintentional Bias

Asks 2 Questions in one, resulting in a yes to one means yes to both. Does not allow for a yes and no answer.

Unnecessary Complexity (compound question)

Strong association btw questions leading to people under or over state.

Ordering of questions (leading questions)

Leading questions are asked in order to push the respondents towards a certain answer.

Push Pole

These 4 things affect the Center of Data:

Mean
Median
Mode
Sample Size

These 4 things affect the variation:

Range
Quartile (IQR- Inter Quartile Range)
Variance
Standard Deviation

Shape of:
Mean = Median = Mode

Normal Distribution

Shape of:
Mean > Median

Skewed to the Right

Shape of:
Mean < Median

Skewed to the Left

In a skewed graph, where is the Median found?

Between the Mode and Mean.

On a skewed graph, where is the Mode found?

Highest point.

On a skewed graph, where is the Mean?

Lowest point (on tail side)

If all values are the same, then they all equal the mean.

There is no variability

When some values are different (above/below) the mean.

Variability

Range

Max - Min

Will an outlier change the range?

Yes.

Is Range resistant to outliers?

No.

What is the total area of the density curve? (purple page)

1

What does the median divide in half?

Area (1)

At what point does any curve balance?

Mean

What symbol is used for population mean?

mu (u)

What symbol is used for sample mean?

x-bar

Give 3 examples of things that are normally distributed.

Children's heights
Arrival times to class
Manufactured things
Cat sizes
Test scores
Car weight
Size of things that grow.

Give examples of things that are not normally distributed

Leaving a theater
Leaving a subway
Test Scores for class that has mostly mastered the material

What symbol is used for standard deviation?

o with a line

What three percentages are important with the normal curve? What is it called?

68% - 95% - 99.7%
Emperical Rule

What are the percentages within the Area of a normal curve for each standard deviation starting at the Mean?

34%
13.5%
2.35%
.15%

What percent entails 1 standard deviation?

68%

What percent entails 2 standard deviations?

95%

What percent entails 3 standard deviations?

99.7%

What percent entails 4 standard deviations?

100%

Why is it important to standardize a normal distribution, or, how would it help?

To be able to compare observations from different normal distributions

Formula that is used to standardize a normal distribution? What is the result called?

Z = x - u
-------
o-
Z-score

What does the "x" represent in the Z-score formula?

The variable for which the z-score is being sought (such as a test grade)

The shape of the graph representing the distribution of annual incomes for full-time American workers would be:

Skewed to the Right

The shape of a graph representing the distribution of the percentage of senior citizens over the decades would be:

Skewed to the Left

Regarding Gas, Price and Gallons, which is the variable and which is the count/frequency?

Variable: Price
Count/Frequency: # of Gallons

Typical deviation from mean?

Standard Deviation

Small values of Standard Deviation indicate _____ variability in the data.

Small

Is Mean resistant to outliers?

No

What is Sample Standard Deviation in TI?

Sx

How to get sample variance out of Sx?

S^2

Are the mean and Standard Deviation resistant to outliers?

No

Inter Quartile Range (IQR)

Q3 - Q1

How should IQR be written out if outliers are present or skewed distribution?

Min - Q1 - Median - Q3 - Max

When can Mean and Standard Deviation be used?

Normal Distributions
Uniform Distributions

What is a Uniform Distribution

All values remain the same (straight line/rectangle shape graph)

What measures the spread about the mean and should only be used when the mean is choosen as the measure of center?

s
(Sample Standard Deviation)

S is always ______ or s = ___.

Positive
0

The more the Standard Deviation is spread out the ______the s.

larger

S has the ______ units as the original data.

same

The closer data is to the mean, the _____ the s.

smaller

The further the data is from the mean, the ____ the s.

larger

Density curve: is always ____ or _____ the x-axis

on
above

Density cure: the area below the line = exactly ___.

1

What does a Density Curve do?

Describes overall pattern of distribution.

Variable

Characteristics of Individuals

2 Types of Variables:

Quantitative
Categorical (qualitative)

3 kinds of Histograms:

Normal
Skewed to Left
Skewed to Right

How many variables in a histogram?

1

Are Histograms quantitative or categorical?

Quantitative

S^2 = ____

Square root of Sx

For a Normal Distribution, what is used to find spread?

Mean
Deviation (o- or s)

In a normal distribution, how do you write that the mean is 64 and the 0- is 2.7 ?

N (64,2.7)
N( mean, o- )

# of standard deviations that a data value, x, falls from the mean (u) is called a ____. Give formula

Z - score
Z= x-u / o-

A standard normal curve will have ____ as it's mean. Give an example of how this is written using N ( , ) with a normal curve and 1 = o- .

0
N (0,1)

Z-scores are used to compare _____.

Normal Distributions

Z- scores are what kind of #'s?

+ or -

What kind of distributions are Z-scores used with?

ONLY Normal Distributions

What is considered a rare or unusual Z-score?

Z < -2 or 2< Z

Is there a Q4?

No

Percentiles, how many are there?

99

What are the equivelant %iles for Q1, Q2, Q3?

Q1 = P25
Q2 = P50
Q3 = P75

How to find %ile of x?

(# values of < x / total # values ) x 100

Equation to find %ile of 20 if total 30 values?

(19/30) x 100 = 63 rd %ile

If the median of a boxplot is near the center of the box and each whisker is approximately equal length, the distribution is roughly ____.

Symmetrical

If the median of a boxplot is to the left of the center of the box or the right whisker is substantially longer than the lrft, the distribution is skewed _____.

Right

If the median of a boxplot is to the right of the center of the box or the left whisker is substantially longer than the right whisker, the distribution is skewed ______.

Left

The highest point in the histogram or density curve is the ____.

Mode

The _____ is pulled toward the tail.

Mean

The ______ is between the ____ and the _____.

Median
Mode
Mean

Tell how to find a weighted mean.

Figure each individual mean.
Add all means together
Divide by total number of means.

How to calculate the median of a stemplot?

Upper Value + Lower Value
____________________
2

Is the Median Resistant to Outliers?

Yes (# of values doesn't change)

Example of a Probability experiment?

Rolling die

What would {3} be in a rolling die experiment?

Outcome

What would {1,2,3,4,5,6} be in a rolling die probability experiment?

Sample Space

What would the fact that die rolled is even : {2,4,6,}

Event

Event with 1 outcome.

Simple Event

Fundamental Counting Principal

Event 1 occurs in "m" ways.
Event 2 occurs in "n" ways.
Therefore, # of ways that 2 events can occur =
M(N)= Total Ways

How many phone number combinations are there for a 10 didget number.

8 *10*10*10*10*10*10*10
10
10
0*10*10*10*10*10*10
10
10
0*10*10*10*10*10*10*10*10
(8 is used because first number cannot be a 1 nor a 0)
8 million combinations

If last didget of a phone number must be even, how many options would there be?

5
(0,2,4,6,8)

If last didget of a phone number must be odd, how many options would there be?

5
(1,3,5,7,9)

How to calculate probability (how likely something will happen)?

P(E) = # of outcomes in an event
--------------------------------
# of total outcomes in Sample

Give an example of how to calculate probability.

P(E) = Made basket 5 times
-------------------------
Shot basket 10 times

Face cards are:

J, Q, K

Ace counts as a ___.

1

How many cards in a deck?

52

Probability that you will pick a face card out of a deck?

12/52 = .23

Formula for frequency

F/N
How often something happens
------------------------------------
# of draws

Emperical Probabilities Of Serious Traffic Problems:
Serious Problem: 123
Moderate Problem: 115
Not a Problem: 82
------
320
P(serious problem): /
P(moderate problem): /
P(problem at all): /

123/320 = .38
115/320 = .36
(123 + 115) / 320 = .74

Probability numbers can be ___ <= ____ <= _____

0 <= P(E) <= 1
(0-1 can be probability numbers, including % for 0-1)

On a graph, how do you figure probability for bar "f"?

Add up frequency for each bar/point.
Divide frequency of F by the total Frequency of all bars.

Box plot: % of each whisker?

25%

Box plot: % of entire box?

50%

Box Plot: Middle of box to any edge?

25%

Box Plot: Middle of Box to end of any given whisker?

50%

Box plots Quartiles are divided by _____, not ______.

% not length

What is the probability that a person is between Q3 and the far right whisker?

25%