Statistics

Statistics

- science of collecting, organizing, summarizing and analyzing data to draw conclusions or answering questions, with a given amount of confidence concerning the answer
- method or process used in finding an answer to a question, with a specific amount of

Descriptive

Type of Statistic: is about organizing and summarizing data. Describes data through numerical summaries, tables, and graphs

Inferential

Type of Statistics: uses methods that takes a result from a sample, extends it to the population, and measures the reliability of the results

Population

The entire group of individuals being investigated (size is N). Must be precisely defined.

Sample

A subset of individuals of a given size (n) taken from the population.

Variable

A given aspect of an individual. This would be the definition of the aspect. For example, if a population or sample consists of people then the _____ could be weight, height, color of eyes, gender, etc. What aspects could be found for a individual country

Data

The possible observations or outcomes for a variable concerning individuals. This is a label or a count of a measurement.

Characteristic

A summary of a numerical variable of a population or sample such as mean, max, range or standard deviation. Label data can not be summarized.

parameter

Characteristics (numerical) that come from the POPULATION

Statistics

Characteristics (numerical) that come from a SAMPLE

Parameter vs statistic if data comes from label

If the data comes from labels then not called either (as no numerical summary is possible for Qualitative data).

Qualitative data

comes from LABELS

Quantitative data

comes from NUMERICAL data

Discrete data

comes from counts and are whole parts (have no decimal parts)

Continuous data

comes from measurements and are real numbers (may have decimal parts)

Observation

Data from observing, only. Not interfering with the process in any way.

Experimentation

Data from controlling some factors of a process. Often involves comparing results of two or more values of a control factor. Involves the interference by the investigator.

Simple Random Sampling

Where every individual in the population has an equal chance of being selected. The best and the goal of sampling methods.

Stratified Sampling

Separating the population into non overlapping groups (strata) and then selecting simple random samples from each group

Systematic Sampling

Obtained by selecting every kth member of the population.

Cluster Sampling

Dividing the population into groups (or clusters) and selecting all the individuals from that cluster.

Convenience Sampling

Individuals are selected on the ease of obtaining them and not randomly from the population.

Non-response Bias

From Voluntary Sampling where individuals can refuse to take part.

Response Bias

Answers do not reflect true feelings of responder Caused by:
-Interviewer error, the way the question is framed;
-The choices and wording offered in survey;
-Order of questions or responses
-Plain old entry error.

BLIND STUDIES

-Can lead to Bias
-Experiments in which the participants do not know whether or not they are a part of the control group.

DOUBLE BLIND STUDIES

The best.
Experiments in which neither the participants nor the people analyzing the results know who is in the control group

Frequency Distribution

Lists each category (label) of data and the number of occurrences.

Relative Frequency

The proportion of occurrences for each category calculated as: Frequency/ sum of all frequencies
- Sum of all = 1

Cumulative Frequency Distributions

Each class listed as before (lowest to largest), but the frequencies are the total for that frequency and all the lower classes. adding down the list of frequency distribution to get total sum

Relative Cumulative Frequency

Distribution: Each Cumulative Frequency divided by total of all frequencies. The last class will have a cumulative value of 1.0. To find, first find relative freq from original distribution and then add continue adding down

Bar Graph

Vertical or Horizontal. X-axis contains the categories or labels. For Frequency Distributions the y-axis is the number of occurrances. For Relative Frequency Distributions the y-axis is the proportion (values between 0 and 1). Bars do not need to be touch

pie chart

A chart that shows the relationship of a part to a whole shows percentages in a circle type graph

Histogram

Vertical bar graphs, where the x-axis is the number line and each bar is for a class. All bars must touch side to side. Uses Lower Class limit on x-axis

class

An interval of numbers along the number line.

Lower Class Limit (LCL)

The beginning number of the class.

Upper Class Limit (UCL)

The last number of the class.

Class Width

the difference between lower class limits (or upper class limits), found by taking using data set's maximum and minimum and calculating
Max - Min/ # of classes

Midpoint of Each Class

The point in the middle of the class, found by averaging the class lower class limit and the next class lower class limit.

Stem Leaf Plot

Used for recording and showing dispersion of data. Stem can be the integer portion of a number and the leaves the decimal portion. Or the stem could be the tens digit and the leaves the ones digit.

Uniform graph

A graph where all the class have the same frequency.

Normal Graph

Skewed Right Graph

Skewed Left Graph

MEAN OF POPULATION

X with a line over it

mean of sample

?

Population standard deviation

S

sample standard deviation

?

Sum of individuals

LAW OF LARGE NUMBERS:

AS n -> N THEN X with line over it -> �
In other words, as the sample size gets closer to the population size the sample mean gets closer to the real population mean.

N

size of population

n

size of sample

Mean

A measure of center in a set of numerical data, computed by adding the values in a list and then dividing by the number of values in the list.

Median

Middle value (if n is odd) or the average of the two middle values (if n is even).

Mode

most frequent values

Range

Max - Min

STANDARD DEVIATION

can be thought of as the average distance of the values from the mean.

Variance

The square of the Standard Deviation.

EMPERICAL RULE

-Approximately 68% of the population are within the range of � +- 1?
-Approximately 95% of the population are within the range of � +- 2?
-Approximately 99.7% of the population are within the range of � +- 3?

MEAN AND STANDARD DEVIATION FROM FREQUENCY DISTRIBUTIONS

TO DO ON CALCULATOR, ENTER TABLE IN L1 & L2. THEN DO
STAT CALC 1: 1-Var Stats ENTER L1,L2

Weighted Average
(look on ppt and notes)

Wtd Avg. = (weight)(data point) + (weight)(data point) / sum of weight

Z-SCORE

How far a data point is from the Mean in terms of Std. Dev.'s
Z = X with a line over it - x/ s

Percentile

the value for which k% of the data set is ? Pk
-For instance if P18=7.6, then 18% of the sample or population is less than or equal to 7.6 and 82% are greater than

FIVE NUMBER SUMMARY

MIN, Q1, MEDIAN, Q3, MAX

BOX PLOT

Good for comparing distributions

Inter Quartile Range

IQR = Q3 - Q1.

Probability

measure of the likelihood of a random phenomenon or chance behavior.

The Law of Large Numbers

As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome.

experiment

is any process that can be repeated in which the results are uncertain.

simple event

any single outcome from a probability experiment.Each ____ is denoted ei.

sample space, S

of a probability experiment is the collection of all possible simple events. In other words, the ______is a list of all possible outcomes of a probability experiment.

event

is any collection of outcomes from a probability experiment. An ____ may consist of one or more simple _____. _____ are denoted using capital letters such as E.

probability of an event

denoted P(E), is the likelihood of that event occurring.

Computing probability using classical method

-computing probabilities requires equally likely outcomes.
-An experiment is said to have equally likely outcomes when each simple event has the same probability of occurring.
- P(E)= # of ways that E can occur/ # of possible outcomes

Computing Probability Using the Empirical Method

-The probability of an event E is approximately the number of times event E is observed divided by the number of repetitions of the experiment.
P(E) = frequency E / # of trials of experiment

Addition Rule for not mutually exclusive

For any two events E and F,
P(E or F) = P(E) + P(F) - P(E and F)

Venn diagrams

represent events as circles enclosed in a rectangle. The rectangle represents the sample space and each circle represents an event.

mutually exclusive

If events E and F have no simple events in common or cannot occur simultaneously

Addition Rule for Mutually Exclusive Events

If E and F are mutually exclusive events, then
P(E or F) = P(E) + P(F)
In general, if E, F, G, ... are mutually exclusive events, then
P(E or F or G or ...) = P(E) + P(F) + P(G) + ...

Complement Rule

the probability of an event occurring is 1 minus the probability that it doesn't occur
P(E with line over it) = 1-P(E)

Conditional Probability

The notation P(F | E) is read "the probability of event F given event E". It is the probability of an event F given the occurrence of the event E.

Multiplication rule

- probability that 2 events E and F both occur is
P(E and F) = P(E) x P(F|E)
- Probability of E and F is the prob of event E occurring times the prob of event E occurring given the occurrence of event E

independent

Two event are ____ if the occurrence of event E in a probability experiment does not affect the probability of event F
If two events, A and B, are independent then P(A?B) = P(A) x P(B)
P(F | E) = P(F) or P(E | F) = P(E)

dependent

two event are ____ if the occurrence of event E in a probability experiment affects the probability of event F.

Multiplication rule for independent events

if E and F are indep events, the prob E and F both occur is P(E and F) = P(E) x P(F)
- prob of E and F is the prob of event times the prob of event F

Permutations

IS an ORDERED arrangement of r objects chosen from n distinct objects without repetition is denoted as which is nPr

Combinations

is an arrangement of r objects chosen from n distinct objects without repetition and WITHOUT regard to order is denoted as nCr

Not mutually exclusive

P(A?B) = P(A) + P(B) - P(A?B)

Factorial

A statistical experimental design used to measure the effects of two or more independent variables at various levels and to allow for interactions between variables.
ex:
0! = 0
1! = 1
2! = 1 x 2 = 2
3! = 1 x 2 x 3 = 6

attributes

non - numerical data

frequency distribution

lists the NUMBERS of occurrences of each category data

relative frequency distribution

lists the PROPORTION of occurrences of each category data