Statistics
- science of collecting, organizing, summarizing and analyzing data to draw conclusions or answering questions, with a given amount of confidence concerning the answer
- method or process used in finding an answer to a question, with a specific amount of
Descriptive
Type of Statistic: is about organizing and summarizing data. Describes data through numerical summaries, tables, and graphs
Inferential
Type of Statistics: uses methods that takes a result from a sample, extends it to the population, and measures the reliability of the results
Population
The entire group of individuals being investigated (size is N). Must be precisely defined.
Sample
A subset of individuals of a given size (n) taken from the population.
Variable
A given aspect of an individual. This would be the definition of the aspect. For example, if a population or sample consists of people then the _____ could be weight, height, color of eyes, gender, etc. What aspects could be found for a individual country
Data
The possible observations or outcomes for a variable concerning individuals. This is a label or a count of a measurement.
Characteristic
A summary of a numerical variable of a population or sample such as mean, max, range or standard deviation. Label data can not be summarized.
parameter
Characteristics (numerical) that come from the POPULATION
Statistics
Characteristics (numerical) that come from a SAMPLE
Parameter vs statistic if data comes from label
If the data comes from labels then not called either (as no numerical summary is possible for Qualitative data).
Qualitative data
comes from LABELS
Quantitative data
comes from NUMERICAL data
Discrete data
comes from counts and are whole parts (have no decimal parts)
Continuous data
comes from measurements and are real numbers (may have decimal parts)
Observation
Data from observing, only. Not interfering with the process in any way.
Experimentation
Data from controlling some factors of a process. Often involves comparing results of two or more values of a control factor. Involves the interference by the investigator.
Simple Random Sampling
Where every individual in the population has an equal chance of being selected. The best and the goal of sampling methods.
Stratified Sampling
Separating the population into non overlapping groups (strata) and then selecting simple random samples from each group
Systematic Sampling
Obtained by selecting every kth member of the population.
Cluster Sampling
Dividing the population into groups (or clusters) and selecting all the individuals from that cluster.
Convenience Sampling
Individuals are selected on the ease of obtaining them and not randomly from the population.
Non-response Bias
From Voluntary Sampling where individuals can refuse to take part.
Response Bias
Answers do not reflect true feelings of responder Caused by:
-Interviewer error, the way the question is framed;
-The choices and wording offered in survey;
-Order of questions or responses
-Plain old entry error.
BLIND STUDIES
-Can lead to Bias
-Experiments in which the participants do not know whether or not they are a part of the control group.
DOUBLE BLIND STUDIES
The best.
Experiments in which neither the participants nor the people analyzing the results know who is in the control group
Frequency Distribution
Lists each category (label) of data and the number of occurrences.
Relative Frequency
The proportion of occurrences for each category calculated as: Frequency/ sum of all frequencies
- Sum of all = 1
Cumulative Frequency Distributions
Each class listed as before (lowest to largest), but the frequencies are the total for that frequency and all the lower classes. adding down the list of frequency distribution to get total sum
Relative Cumulative Frequency
Distribution: Each Cumulative Frequency divided by total of all frequencies. The last class will have a cumulative value of 1.0. To find, first find relative freq from original distribution and then add continue adding down
Bar Graph
Vertical or Horizontal. X-axis contains the categories or labels. For Frequency Distributions the y-axis is the number of occurrances. For Relative Frequency Distributions the y-axis is the proportion (values between 0 and 1). Bars do not need to be touch
pie chart
A chart that shows the relationship of a part to a whole shows percentages in a circle type graph
Histogram
Vertical bar graphs, where the x-axis is the number line and each bar is for a class. All bars must touch side to side. Uses Lower Class limit on x-axis
class
An interval of numbers along the number line.
Lower Class Limit (LCL)
The beginning number of the class.
Upper Class Limit (UCL)
The last number of the class.
Class Width
the difference between lower class limits (or upper class limits), found by taking using data set's maximum and minimum and calculating
Max - Min/ # of classes
Midpoint of Each Class
The point in the middle of the class, found by averaging the class lower class limit and the next class lower class limit.
Stem Leaf Plot
Used for recording and showing dispersion of data. Stem can be the integer portion of a number and the leaves the decimal portion. Or the stem could be the tens digit and the leaves the ones digit.
Uniform graph
A graph where all the class have the same frequency.
Normal Graph
Skewed Right Graph
Skewed Left Graph
�
MEAN OF POPULATION
X with a line over it
mean of sample
?
Population standard deviation
S
sample standard deviation
?
Sum of individuals
LAW OF LARGE NUMBERS:
AS n -> N THEN X with line over it -> �
In other words, as the sample size gets closer to the population size the sample mean gets closer to the real population mean.
N
size of population
n
size of sample
Mean
A measure of center in a set of numerical data, computed by adding the values in a list and then dividing by the number of values in the list.
Median
Middle value (if n is odd) or the average of the two middle values (if n is even).
Mode
most frequent values
Range
Max - Min
STANDARD DEVIATION
can be thought of as the average distance of the values from the mean.
Variance
The square of the Standard Deviation.
EMPERICAL RULE
-Approximately 68% of the population are within the range of � +- 1?
-Approximately 95% of the population are within the range of � +- 2?
-Approximately 99.7% of the population are within the range of � +- 3?
MEAN AND STANDARD DEVIATION FROM FREQUENCY DISTRIBUTIONS
TO DO ON CALCULATOR, ENTER TABLE IN L1 & L2. THEN DO
STAT CALC 1: 1-Var Stats ENTER L1,L2
Weighted Average
(look on ppt and notes)
Wtd Avg. = (weight)(data point) + (weight)(data point) / sum of weight
Z-SCORE
How far a data point is from the Mean in terms of Std. Dev.'s
Z = X with a line over it - x/ s
Percentile
the value for which k% of the data set is ? Pk
-For instance if P18=7.6, then 18% of the sample or population is less than or equal to 7.6 and 82% are greater than
FIVE NUMBER SUMMARY
MIN, Q1, MEDIAN, Q3, MAX
BOX PLOT
Good for comparing distributions
Inter Quartile Range
IQR = Q3 - Q1.
Probability
measure of the likelihood of a random phenomenon or chance behavior.
The Law of Large Numbers
As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome.
experiment
is any process that can be repeated in which the results are uncertain.
simple event
any single outcome from a probability experiment.Each ____ is denoted ei.
sample space, S
of a probability experiment is the collection of all possible simple events. In other words, the ______is a list of all possible outcomes of a probability experiment.
event
is any collection of outcomes from a probability experiment. An ____ may consist of one or more simple _____. _____ are denoted using capital letters such as E.
probability of an event
denoted P(E), is the likelihood of that event occurring.
Computing probability using classical method
-computing probabilities requires equally likely outcomes.
-An experiment is said to have equally likely outcomes when each simple event has the same probability of occurring.
- P(E)= # of ways that E can occur/ # of possible outcomes
Computing Probability Using the Empirical Method
-The probability of an event E is approximately the number of times event E is observed divided by the number of repetitions of the experiment.
P(E) = frequency E / # of trials of experiment
Addition Rule for not mutually exclusive
For any two events E and F,
P(E or F) = P(E) + P(F) - P(E and F)
Venn diagrams
represent events as circles enclosed in a rectangle. The rectangle represents the sample space and each circle represents an event.
mutually exclusive
If events E and F have no simple events in common or cannot occur simultaneously
Addition Rule for Mutually Exclusive Events
If E and F are mutually exclusive events, then
P(E or F) = P(E) + P(F)
In general, if E, F, G, ... are mutually exclusive events, then
P(E or F or G or ...) = P(E) + P(F) + P(G) + ...
Complement Rule
the probability of an event occurring is 1 minus the probability that it doesn't occur
P(E with line over it) = 1-P(E)
Conditional Probability
The notation P(F | E) is read "the probability of event F given event E". It is the probability of an event F given the occurrence of the event E.
Multiplication rule
- probability that 2 events E and F both occur is
P(E and F) = P(E) x P(F|E)
- Probability of E and F is the prob of event E occurring times the prob of event E occurring given the occurrence of event E
independent
Two event are ____ if the occurrence of event E in a probability experiment does not affect the probability of event F
If two events, A and B, are independent then P(A?B) = P(A) x P(B)
P(F | E) = P(F) or P(E | F) = P(E)
dependent
two event are ____ if the occurrence of event E in a probability experiment affects the probability of event F.
Multiplication rule for independent events
if E and F are indep events, the prob E and F both occur is P(E and F) = P(E) x P(F)
- prob of E and F is the prob of event times the prob of event F
Permutations
IS an ORDERED arrangement of r objects chosen from n distinct objects without repetition is denoted as which is nPr
Combinations
is an arrangement of r objects chosen from n distinct objects without repetition and WITHOUT regard to order is denoted as nCr
Not mutually exclusive
P(A?B) = P(A) + P(B) - P(A?B)
Factorial
A statistical experimental design used to measure the effects of two or more independent variables at various levels and to allow for interactions between variables.
ex:
0! = 0
1! = 1
2! = 1 x 2 = 2
3! = 1 x 2 x 3 = 6
attributes
non - numerical data
frequency distribution
lists the NUMBERS of occurrences of each category data
relative frequency distribution
lists the PROPORTION of occurrences of each category data