Statistics Chapter 4: Probability

4.1 randomness

...

4.2 probability models

...

Random

a phenomenon is random if any individual outcome is unpredictable, but the distribution of outcomes over many repetitions is known
example: toss a coin. no flip is predictable, but many flips will result in approximately half heads and half tails
-remembe

probability

the probability of an outcome is the proportion of times that it would occur over many repetitions.
-often, people expect the outcomes to settle into some regularity much sooner than they actually do.

sample space

the sample space is the set of all possible outcomes, denoted S
example: toss a coin three times. The sample space is ... S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
-the size of S is denoted lSl.
-example: toss a die twice. The sample space is... S={(1,1)

CLASS PROBLEM: Toss a coin, if it lands heads, roll a die once. If it lands tails, flip the coin one more time. What is the sample space, and what is the size of the sample space?

S={(H,1), (H,2),...(H,6), (T,H), (T,T)} lsl=8

record the number of people that walk into a post office each day.
a) what is the sample space?
b) How do you think the outcomes will be distributed (what shape)

a) S={0,1,2,3,....) lsl= infinity
b) skewed-right

Events

-An event is some set of outcomes from the sample space. events are denoted by capital letters A,B,C....
-The complement of an event A is the event that A doesn't happen
-It is denoted A^c and may be thought of as the event "not A"
-Two events are Indepen

Events

-The event A and B is the set of outcomes that belong to both sets (their overlap)
-The event A or B is both sets taken together.
-the Empty set, denoted � is the set containing no elements at all
-two events are disjoint if they cannot both occur
-disjoi

Example:
S = {0,1,2,3,4,5,6,7,8}
A = {2,3,6,7} B = {0,3,6,8}

A and B = {3,6}
A or B = {0,2,3,6,7,8}
A^c and B = {0,8}
A^c or B^c = {0,1,2,4,5,7,8}
(A and B)^c = {0,1,2,4,5,7,8}
(A or B)^c = {1,4,5}
A and A^c = {�}

at a hospital, the probability of a patient having surgery is 12%, and obstetric treatment 16% and the probability of both is 2%. What is the probability that a patient will have neither treatment?

.74

CLASS PROBLEM: In real estate ads it is found that 64% of homes have garages, 9% have pools, and 28% have a finished basement. 5% have a garage and a pool, 19% have a garage and a basement, 4% have a basement and a pool, and 2% have all three. What percen

G=64-2=62-3-17=42
P=9-2=7-3-2=2
B=28-2=26-17-2=7
G&P=5-2=3
G&B=19-2=17
B&P=4-2=2
All=2
100-42-2-7-3-17-2-2=25

The Addition Rule:
P(A or B)=P(A) + P(B)-P(A and B)
Overlap counted twice, subtract out once.

In an office building of 80 people, 28 work on Saturday, 11 work on Sunday, and 3 people work on both Sunday and Saturday. What is the probability that a person in this office works at least one of these days?
P(Sat or Sun)= P(Sat) + P(Sun) - P(Both) = 28

Class Problem: Out of 125 students surveyed, 12 were accounting majors, 24 were business majors, and 34 were either an accounting major or business major (or both). Draw and label a Venn Diagram

Acc 10 Both 2 Bus 22

Calculating probability: Roll a die twice, what is the probability that the sum of the faces will be 8?

P(Sum=8)=5/36

Class problem: Roll a die twice, what is the probability that the number on the second cast is greater than the one on the first cast?

P(2nd>1st) = 15/36=5/12

Class Problem: A card is drawn from a deck of 52 cards.
-what is the probability that it is neither a diamond nor an ace?
-What is the probability that it is either not a diamond or it is not an ace?

-13 cards are diamonds and 3 more are aces, that leaves 36 cards, so 36/52= .6923
-there is only one card that doesn't fit either category-the ace of diamonds, so 51/52= .9808

INDEPENDENCE:
two events A and B are independent if P(A and B)=P(A)*P(B)

Example: P(A)= .3 P(B)= .5 P(A and B)= .10
.15 does not equal .10 so A and B are not independent
Example: P(A)=.2 P(B)= .6 P(A or B)= .68
are A and B independent? (first use addition rule)
By the addition rule. P(A and B)=.12 and (.2)(.6)=.12, so A and B

Rules of Thumb (1)

1) "and" means multiply when the events are independent
-toss a coin three times. what is the probability of all three being tails?
-that is, tails first AND tails second AND tails third
-since coin flips are independent we multiple, .5x.5x.5=.125

Rules of Thumb (2)

2) "or" means add when the events are disjoint
-roll two dice. What is the probability that the sum of the faces is 5 or 11?
-since the sum cannot be 5 and 11 at the same time, these are disjoint outcomes, so we add: P(sum=5)+P(sum=11)=4/36+2/36=1/6

Rules of Thumb (3)

3) for any probability question, first decide whether it is easier to calculate it directly, or easier to calculate the opposite and subtract from 1.
-a coin is tossed 7 times, what is the probability of tails occurring at least once?
-easier to answer th

Rules of Thumb (3 continued)

-if there are 23 people in a room, what is the probability that at least two of them have the same b-day?
-P(at least 2) = 1-P(all different)= # different bdays for 23 people/# possible bdays for 23 people = 1- (365
364
363...
343/365
365
365
...*365)=1-.

CLASS PROBLEM: The probability of encountering heavy traffic on a Monday is 0.8, and the probability of encountering heavy traffic on a Tuesday is 0.6
1. someone claims the probability of heavy traffic occurring both days is .3, why is this impossible?
2.

1. 0.8+0.6-0.3=1.1
2. P(M or T)= P(M)+P(T)-P(M and T)= 0.8+0.6-(0.8)(0.6)=0.92
3. P(equal to or greater than 1)=1-P(none)=1-(0.4)^4=0.9744

Law of Large Numbers

states that as an experiment is repeated over and over, the observed frequency of an outcome gets closer to its expected frequency.

Gambler's fallacy, or "law of averages

psychological prejudice that assumes observations will behave as expected much sooner than necessary.
In other words, thinking an event is "due" or "not due"
-playing a different lottery number than last week's winning number because the chances it would

Prosecutor's fallacy

-a man is on trial for a crime, and forensic evidence is found at the scene which implicates him.
-a prosecutor has an expert witness testify that the probability of finding this forensic evidence is 1 in 20,000 if the person is innocent
-by itself, this

Better example

-woman visits her doc and gets tested for rare disease
-doc indicates that the test is 99% accurate (false positive=1%)
-woman tests positive, she concludes there is a 99% chance she has the disease
-this is a rare disease, suppose the incidence in the po

4.3 random variables

-random variable is a variable that assigns a number to each outcome of an experiment. This is not to be confused with an algebraic variable.
-the probability distribution of a random variable is a listing of each possible outcome of a random variable tog

benford's law, also called the first-digit law

states that for certain kinds of data, the first digit in each data value has a curious frequency
-this can be used to access the legitimacy of certain date
-for appropriate data, first digits have the following distribution )with the last value missing
-

4.4 properties of random variables

definitions:
expected value (or mean) of a random variable: this is denoted E(X)
Variance of a random variable: this is denoted V(X)

Examples:

calculate the mean and standard deviation of the following random variable:
-X: -2 3 7
-P(X): .3 .1 .6
-E(X)= (-2)(.3)+(3)(.1)+7(.6)=3.9
-V(X)=(-2-3.9)^2(.3)+(3-3.9)^2(.1)+(7-3.9)^2(.6)=16.29

Examples:

In a game, a die is thrown. Alan pays Sally $1 if the die falls 1,2, or 3, and $3 if the die falls 4 or 5. If the die falls 6, Sally has to pay Alan $8. What is the expected value and standard deviation of the amount Sally wins?
Winnings X: 1 3 -8
P(X): 0

Class problem: John is suing his landlord. If he wins. he will be awarded $6000 and will not have to pay any court costs. If he loses, he will have to pay court fees totaling $200.
-john has found a lawyer that will represent him for $1200. If he hires th

With lawyer: 4800 -1400
P(X): .8 .2
-E(X)= (4800)(.8)+(-1400)(.2)=3560
Without lawyer: 6000 -200
P(X): .6 .4
-E(X)=(6000)(.6)+(-200)(.4)=3520

Class problem: employee bonuses are awarded at the end of the year. Thomas realizes it is possible for him to get a $5000 bonus, but it is unlikely. He is twice as likely to get a $2000 bonus, seven times as likely to get a $1000 bonus, and ten times as l

Bonus: 5000 2000 1000 500
probability: p 2p 7p 10p
-sum of probabilities = 1 > 20p=1 > p=0.05
bonus: 5000 2000 1000 500
Probability: .05 .10 .35 .50
E(X)=(5000)(.05)+(2000)(.10)+(1000)(.35)+(500)(.5)=1050
V(X)=(5000-1050)^2(.05)+(2000-1050)^2(.10)+(1000-1

Properties of Mean and Variance

E(c)=c V(c)=0 E(X+/-Y)=E(X)+/-E(Y)
E(cX)=cE(X) V(cX)=c^2V(X)
if X and Y are independent: V(X+/-Y)=V(X)+V(Y)

Example:

suppose X and Y are independent, and E(X)=120 ox=12 E(Y)=300 ox=16
Find the mean and standard deviation of 2X-5Y
E(2X-5Y)=2E(X)-5E(Y)=2(120)-5(300)=-1260
V(2X-5Y)=V(2X)+V(5Y)=4V(X)+25V(Y)=4(144)+25(256)=6976 > o2x-5y=square root of 6976=83.522

Caution

a random variable does not share the same properties as an algebraic variable
-for an algebraic variable X: X+X+X=3X
-for a random variable, each X may turn out differently, so X+X+X doesnotequal 3X
-this distinction matter when calculating variance.
-X+X

Example: the american vet ass. claims that the annual cost of medical care for dogs averages $100 with a standard deviation of 30$, and the annual cost of medical care for cats averages $130 with a standard deviation of $35
a) what's the expected differen

a) E(C-D)=E(C)-E(D)=120-100=$20
b) V(C-D)=V(C)+V(D)=1225+900=2125 > O c-d=$46.1
c) we are told the difference is normal, and we already found the center and spread. Difference N(20,46.1)
P(difference<0)=P(Z<(0-20/46.1)=P(Z<-.4338)=.3322

Class problem: K, A, and M have completed several relay triathlons. K-swimming, A-bikes, M-runs. Their respective completion times (in hours) have means .77, 1.33, and .9, and their respective standard deviations are .05, .08, and .06.
a) what is their ex

a)E(K+A+M)=E(K)+E(A)+E(M)=.77+1.33+.9=3
b)V(K+A+M)=V(K)+V(A)+V(M)=.0025+.0064+.0036=.0125
oK+A+M=Square root of .0125=.1118
c) T N(3, .1118) > P(T<2.75)=P(Z<2.236)=0.0127