Math 385: Probability and Statistics for MMSS

probability

numerical assignment to uncertain event

ex-ante

before the uncertainty is resolved

ex-post

after the uncertainty is resolved

experiment

any situation in which the possible outcomes aren't known with certainty

sample space

collection of ALL possible outcomes of an experiment

event

any subset of a sample space (includes elements, space itself, and empty set)

4 basic set operations

1) union
2) intersection
3) complementation
4) set difference

union

for 2 sets A and B, the union of A and B is the set of all points in A and/or B; AUB = {x?S: x?A and/or x?B}

intersection

for 2 sets A and B, the set of all elements that are in both A and B; A?B = {x?S: x?A and x?B}

complementation

the complement of a set A is the set of all elements that are not in A; A? = {x?S: x?A}

set difference

the difference between A and B is the set of all elements that are in A but not in B; A\B = A?B? = {x?S: x?A, x?B}

disjoint

2 sets A and B are disjoint if A?B = ?

synonym for disjoint

mutually exclusive

subset

A is a subset of B if all the elements in A are also in B; A?B

subset theorem

If A?B and B?A, then A=B

4 set properties

1) associative laws
2) commutative laws
3) distributive laws
4) DeMorgan's laws

associate laws

A U (B U C) = (A U B) U C; same for intersection

commutative laws

AUB = BUA; same for intersection

distributive laws

A U (B ? C) = (A U B) ? (A U C); same vice versa

DeMorgan's laws

(AUB)? = A? ? B?

probability

a probability measure is a function that assigns numbers to events for an event E; the probability of E, P(E)

A ? B = ?

A and B are disjoint

S

universal set; the sample space

E? =

S \ E

3 axioms of probability of E

1) P(E) ? 0
2) P(S) = 1
3) For a sequence E1, E2,..., of mutually exclusive events (Ej ? Ek = ?), then P(?Uj=1 Ej) = ??j=1 P(Ej)

4 Properties of Probability

1) P(?) = 0
2) P(A?) = 1 - P(A)
3) If A?B, then P(A)?P(B)
4) P(AUB) = P(A) + P(B) - P(A?B)

If 2 sets aren't necessarily disjoint, then

must subtract intersection of the sets

Let A be an event, then (3)

1) (A?)?=A
2) ??=S
3) S?=?

For all sets A and B, (5 union)

1) A U B = B U A
2) A U ? = A
3) A U A = A
4) A U S = S
5) A U A? = S

If A?B, then

A U B = B and A ? B = A

If each set of outcomes in some countable collection is an event,

we must call their union an event

For all sets A and B, (5 intersection)

1) A ? B = B ? A
2) A ? ? = ?
3) A ? A = A
4) A ? S = A
5) A ? A? = ?

simple sample space

sample space where every outcome is EQUALLY LIKELY and number of outcomes is finite

simple sample space probability

P(E) = n(E)/n

finite sample space

S = {s1, s2,...,sn}; countable

If we let Pj=P(Sj), j=1,...,n, then by 3 axioms:

1) Pj ? 0 for j=1,...,n
2) n?j=1 pj = 1
3) P(A) = ??S=A P(S)

when all elements of S are equally likely,

Pj = 1/n

multiplication principle

Suppose an experiment has following properties:
1) experiment has k parts
2) kth part has nk possible outcomes, regardless of outcome in n1,...,nk-1
Then, total number of outcomes = n1xn2x...xnk

permutation

given a set A=(a1,a2,...,an), in how many ways can we get samples of size k out of the n elements; ordered; always k?n

permutation w/ replacement

n^k; allow k>n

permutation w/o replacement

Pn,k = n x (n-1) x (n-2) x...x (n-(k+1)) = n!/(n-k)!

permutation corollary

the number of possible orders of n elements is Pn,n = n!

combination

set of n elements A=(a1,...,an), in how many samples of size k are possible; unordered and w/o replacement

combination formula

Cn,k = (Pn,k) / k! = n! / ((n-k)! k!)

binomial coefficient

(n k) = Cn,k = n! / ((n-k)! k!); "n choose k

counting summary

- ordered w/ replacement = n^k
- ordered w/o replacement = Pn,k
- unordered w/o replacement = Cn,k

binomial coefficient properties

1) for all n, (n 0) = (n n) = 1
2) for all n and all k=0,1,..,n, (n k) = (n n-k)

0! =

1

multinomial coefficients

want to split n elements into k groups such that n1+n2+...+nk=n; (n n1)(n-n1 n2)... = n! / (n!n2!...nk!) = (n n1,n2,...nk)

Probability of Union of 3 Events

P(A1UA2UA3) = P(A1)+P(A2)+P(A3)-P(A1?A2)-P(A1?A3)-P(A2?A3)+P(A1A2A3)

Simpson's Paradox

when a correlation present in different groups is reversed when the groups are combined (ex. basketball players' scores)

Continuity from below

Given A1?A2?A3..., P(?Uj=1 Aj) = lim,n-->? P(An); use complements for continuity from above

Given P,Q; z=aP+bQ; a,b>0; a+b=1; z=[0,1]; z(E)=aP(E)+bQ(E), then...

1) z(S)=aP(S)+bQ(S)=a+b=1
2) {Aj}?,j=1 is disjoint; z(?Uj=1 Aj) = ??j=1 z(Aj)

Probability of Union of Finite Events

1) Disjoint: P(nUj=1 Aj) = n?j=1 P(Aj)
2) Not Disjoint: P(nUj=1 Aj) = [n?j=1 P(Aj)] - [n?j<k P(Aj?Ak)] +...+/- [n?j<..<n P(Aj?...?Ak)]

P(A?B?) =

P(A) - P(A?B)

conditional probability

P(A|B) = P(A?B)/P(B)

multiplication rule for conditional probability

P(A?B) = P(A|B)P(B)

P(A?|B) =

1 - P(A|B)

independence

If 2 events A and B are such that A gives no information about B and B gives no information about A, then A and B are independent.

test for independence

P(A|B)=P(A), P(B|A)=P(B) --> P(A?B) = P(A) x P(B)

If A and B are independent, then

A and B? are independent; also A? and B? are independent

Independence Special Cases:

1) If P(A) or P(B)=0 or 1, then A and B are independent
2) No event is independent with itself
3) No event is independent with its complement, unless either P(A) or P(A?) is 0 or 1
4) If A?B=?, then P(A?B)=P(?)=0, so A and B independent iff either P(A), P

mutually independent

sequence of events A1,...,An are mutually independent if for any subcollection of events Aj1,...,Ajk, P(Aj1?...?Ajk) = P(Aji)x...xP(Ajk)

conditionally independent

collection of events A1,...,An are conditionally independent if for any subcollection of events Aj1,...,Ajk, P(Aj1?...?Ajk|B) = P(Aj1|B)x...xP(Ajk|B)

partition

collection of events B1,...,Bk form a partition S iff: 1) union of all sets equals S 2) sets disjoint/mutually exclusive

law of total probability

Let A be an event and (B1,...,Bk) be a partition of S such that P(Bj)>0. Then, P(A) = P(A|B1)P(B1)+...+P(A|Bk)P(Bk)

Baye's Rule

Let A be an event and (B1,...,Bk) be partition of S such that P(Bj)>0. Then P(Bj|A) = P(A|Bj)P(Bj) / P(A|B1)P(B1)+...+P(A|Bk)P(Bk)

letters in relation to timing

ex-ante: uppercase letters, ex-post: lowercase letters

random variation

real valued function over the sample space; X(s): S-->?

assign probability to random variables by:

P(x?A) = P(s:x(s)?A)

discrete distributions

where X takes finite number of values/countably many; x1,..,xk

probability function

pf or pmf; f(x)=P(X=x) ?x??

support

set of all values of x that have positive probability; ?={x??: f(x)>0}

3 properties of f(x)

1) f(x)?0 ?x??
2) ?x?? f(x) = 1; ?x?? f(x) = 1
3) P(x?A) = ?x?A f(x)

probability function property

2 random variables X and Y can have same pf and not be the same

Bernoulli RV

RV that only takes 2 values: 1 with probability p, 0 with probability 1-p

pf of Bernoulli RV

f(x) = {p? (1-p)?-� , x?{0,1} and 0 otherwise

indicator function

of event E is Ie(s)={1, x?E and 0, x?E

binomial distributions

consider situation in which repeat an experiment n times (each independently) and record each success (prob p) or failure (prob 1-p); X ~ B(n,p)

pf of binomial distributions

f(x) = {(n p)p? (1-p)?-?, x?{1,2,...,n} and 0 otherwise

binomial distribution property

IF X ~ B1(n,p) then Y=X1+X2+...Xn where Xj are independent Bernoulli (p)

joint pf

of 2 RVs X and Y is function such that: f(x,y) = P(X=x, Y=y)

3 properties of joint pf:

1) f(x,y) ? 0 ? (x,y)??�
2) ?x???y?? f(x,y) = 1
3) P((X,Y)?A) = ?(x,y)?A f(x,y)

marginal pf of X

obtained by adding all possible values of y in joint pf; same as regular pf of X; f1(x) = ??j=1 f(x,yj)

For 2 RV, saying X and Y independent means:

for ALL A and B, P(x?A, y?B) = P(x?A)xP(y?B)

2 discrete RV are independent iff:

f(x,y)=f1(x)f2(y) ?(x,y)??; trivial to test outside support b/c probability will always be 0

joint pf independence property

If X and Y independent, then for functions h and g, h(x) and g(y) are also independent

joint pf conditional property

If X and Y are jointly distributed discrete RV, conditional probability that X=x given that Y=y, with P(Y=y)>0 is g(x|y) = f(x,y)/f2(y); note g(x|y)?)

?x?? g(x|y) =

?x?? f(x,y)/f2(y) = 1/f2(y) ?x?? f(x,y) = 1

f(x,y) =

g(x|y)f2(y)

continuous random variable

X is continuous random variable if there exists a non-negative function f such that the probability x lies in a ...; focus: A=[a,b]; informal: x is continuous if x can take all possible numbers in its range

probability density function

pdf; non-negative function f defined on the real line is a pdf of RV x iff: P(a<x<b) = ?b,a f(x)dx

support (for pdf)

x={x??: f(x)>0}

If x is a continuous RV, P(x=a) =

0 ?x?? (must be over a range)

If x is a continuous RV, P(x<a<b) =

P(a?x?b) = P(a?x<b) = P(a<x?b)

0 probability

doesn't mean not possible, just can't assign number ex-ante to event

2 Properties of pdf

1) f(x)?0, ?x??
2) ??,-? f(x)dx = 1

f(x), a pdf, is...

NOT a probability

uniform distribution

on an interval on [0.1] is model saying probability of X being in an interval [0,1] is proportional to length of interval

uniform distribution equation

f(x) = I(0?x?1)

pdf of uniform distribution on [a,b] is...

f(x) = {1/(b-a), a?x?b and 0 otherwise OR f(x) = 1(b-a)*I(a?x?b)

joint pdf

non-negative function f(x,y) on ?� is joint pdf of RVs (x,y) iff P((x,y)?A) = ??A f(x,y)dxdy

2 Joint pdf Properties

1) f(x,y)?0, ?(x,y)??�
2) ??,-???,-? f(x,y)dxdy = 1

marginal pdf

compute by "adding"/integrating out the other variable; f?(x) = ??,-? f(x,y)dy and vice-versa

P(x=a, y=b) =

0 ?(x,y)??�

P(y=h(x)) =

0

independence for pdf

X and Y are independent iff: f(x,y) = f(x)f(y) ?(x,y)??�

conditional for pdf

Let X and Y be 2 continuous RV with joint pdf f(x,y) and marginal pdfs f?(x) and f?(y). Then, for 2 points x? and y? such that f?(x?)>0 and f?(y?)>0, the conditional pdfs are: g?(x|y?) = f(x,y?)/f?(y?) and g?(y|x?) = f(y,x?)/f?(x?)

g?(x|y?) is...

NOT a probability

law of total probability (for pdf)

f?(x) = ??,-? g?(x|y)f?(y)dy

Baye's Rule (for pdf)

g?(y|x?) = g?(x|y)f?(y) / f?(x)

cumulative distribution functions

cdf; the cdf of a RV X is a function defined for each real number x as follows: F(x) = P(X?x), -?<x<?

4 Properties of cdf

1) F(x)?[0,1], ?x??
2) F() is non-decreasing: if x?<x?, then F(x?)<F(x?)
3) lim x-->-? F(x)=0 and lim x-->? F(x)=1
4) F(x) is continuous from the right

2 Facts about cdf

1) cdf of discrete RV have jumps
2) cdf of continuous RV have no jumps

Fundamental Theorem of Calculus with cdf

if pdf function f is continuous, then F'(x)=f(x)

bivariate cdf

P(X?x, Y?y) = F(x,y) ?(x,y)??�

2 Properties of Bivariate cdf

1) F?(x) = lim y-->? F(x,y) and F?(y) = lim x-->? F(x,y)
2) When it exists, ?�F(x,y)/?x?y = f(x,y)

functions of RV

Let Y=r(x), where r() is some function:
if X discrete, P(Y=y) = P(r(x)=y) = ?x:r(x)=y f(x)
if X continuous, G(y) = P(Y?y) = P(r(x)?y) = ?x:r(x)?y f(x)dx

If Y is also continuous, the pdf of Y is:

g(y) = dG(y)/dy

Special Case of functions of RVs

Suppose X and Y both continuous, then simplify under 2 conditions: 1) X is continuous with pdf f(x) 2) r() is DIFFERENTIABLE and STRICTLY MONOTONE, then pdf of Y: g(y) = f(s(y))*|ds(y)/dy|

2 Comments on functions of RVs:

1) r(x) has to be strictly monotone over support of X
2) let [a,b] be range of r(x), then density g(y) = formula for a?x?b or 0 otherwise

expectations

summary measure of central location

expected value of X for discrete

Let X be discrete RV with pf f(x), then expected value of X is E[x] = ?j xj
f(xj) provided ?j |xj|
xj|*f(xj)<?

expected value of X for continuous

Let X be continuous RV with pdf f(x), then expected value of X is E[x] = ??,-? xf(x)dx provided ??,-? |x|f(x)dx<?

expectation exists when...

?, the support of X, is bounded

expectations of functions

Let Y=r(x), then discrete: E[Y] = ?j r(xj)*f(xj); continuous: E[Y] = ??,-? r(x)f(x)dx

expectation rule

in general, E[g(x)] ? f(E[x])

Properties of Expectations

1) Let Y=a+bx, then E[Y] = a+bE[X]
2) Let X1,...,Xk be RVs. Let Y=k?j=1 aj
xj. Then E[Y] = k?j=1 aj
aj*E[xj]
2) If a?x?b, then a?E[X]?b
4) If X and Y are independent, g() is function of X, h() is function of Y, then E[h(Y)g(X)] = E[h(Y)]E[g(X)]
5) If X1,.

mean

?; common way to characterize distribution; not very useful with measure of dispersion; enough when indicator of RV b/c dispersion 1 or 0 (% actually giving probability score next shot, variance is Px(1-P)=52x48)

variance

Let X be RV with mean ?=E(x). Variance of X is V(x)=?�=E[(X-?)�]; V(x)?0; small value indicated probability distribution is tightly concentrated around ?

standard deviation of X

square root of variance; ??�

Variance Properties

1) V(x)=0 iff exists constant c such that P(X=c)=1 (only positive if some uncertainty)
2) V(aX+b) = a�V(X) [if adding constant to RV, doesn't affect variance]
3) For every RV X, V(X) = E(X�)-(E(X))� = E(X�)-?�
4) If X1,...,Xn are independent RVs, V(a1X1+.

variant of sum is

sum of variants ONLY when variables independent

covariance

Let X and Y be RVs and let E(X)=?x, E(Y)=?y, V(X)=?x�< ? and V(Y)=?y�<?, then Cov(X,Y)=?xy = E[(X-?x)(Y-?y)]

covariance is...

measure of LINEAR RELATIONSHIP between X and Y; positive, negative, or zero; depends on units used to measure X and Y

correlation

Let X and Y be RVs and let E(X)=?x, E(Y)=?y, V(X)=?x�< ? and V(Y)=?y�<?, then ?(X,Y) = Cov(X,Y)/?x*?y; can be positive, negative, zero

Correlation/Covariance/Variance Properties

1) Cov(aX,bY)=abCov(X,Y)
2) Cov(X,Y+Z)=Cov(X,Y)+Cov(X,Z)
3) Cov(X,Y)=E(XY)-E(X)E(Y)
4) If X and Y independent RVs with 0<?x�,?y�<?, then ?(X,Y)=0, but converse NOT true
5) If X and Y are RVs such that ?x�,?y�<?, then V(X+Y)=V(X)+V(Y)+2Cov(X,Y) and V(X-Y)=

conditional expectation

the conditional expectation of Y given X=x is denoted E(Y|X) such that E(Y|X=x) = ?y yg?(y|x) for discrete and E(Y|X=x) = ?-?,? yg?(y|x)dy for continuous; these are functions of x; treat x as fixed value

Properties of Conditional Expectation

1) Law of Total Expectation: E(Y) = E[E(Y|X)]
2) If X and Y are independent, E(Y|X)=E(Y)
3) If E(Y|X)=E(Y), then Cov(X,Y)=0

Law of Total Expectation

E(Y) = E[E(Y|X)]

conditional variance

the conditional variance of Y given X= is V(Y|X=x) = E[{[Y-E(Y|X=x)]�|X=x}

quantile function

Suppose F is cdf of CONTINUOUS RV X and that F-� on X(curly), so x=F-�(y) if y=F(x). The ?th quantile of distribution F defined to be value q? such that F(x?) = ?, or x?=F-�(?). ?=1/2 corresponds to the median

median

median defined to be number m such that P(X?m)?1/2 and P(X?m)?1/2 for discrete and P(X?m)=P(X?m)=?-?,m f(x)dx = 1/2

kth moment

Let X be RV and k be positive integer. E(X?) is kth moment of X provided expectation exists

centered moment

let X be RV and k positive integer. E((X-?)?) is kth centered moment of X; first centered moment of X is 0

existence of moments

if E(|X|?)<? for some positive k, then E(|X|?)<? for every positive j<k. Also, E(|X|?)?E(|X|?)

MGF

moment-generating function of RV X is M(t)=E(e^tX)

2 properties of MGF

1) if MGF exists for t in an open interval containing zero, it uniquely determines the probatility distribution (notice that always exists at t=0 where M(0)=1)
2) IF MGF exists for t in an open interval containing zero, then E(X?)=M?(0)=[d?E(e^tX) / dt?]

Bernoulli distribution

X is a Bernoulli random variable if pf is f(x|p) = {p?(1-p)?-� for x=0,1 and 0 otherwise}

Properties of Bernoulli Distribution

1) E(X)=p
2) V(X) = E(X�)-(E(X))� = p(1-p)
3) M(t) = E(e??) = pe?+1-p

Binomial distribution

binomial is sum of independent Bernoulli RVs; f(x|n,p) = {(n x)p?(1-p)?-? for x=0,1,...,n and 0 otherwise}

Properties of Binomial Distribution

1) E(x) = np
2) V(X) = E(X�)-(E(X))� = np(1-p)
3) M(t) = E(e??) = (pe?+1-p)?

Poisson Distribution

Let X be discrete RV and nonnegative, then f(x|?)={e^(-?)?? / x! for x=0,1,... and 0 otherwise}; explicit with time period

Properties of Poisson Distribution

1) E(X)=?
2) V(X)=?
3) M(t)=e^(?(e^t - 1))
4) If X1,...,Xk are independent and Xi has Poisson distribution with mean ?i (i=1,..,k), then sum X1+...+Xk has Poisson distribution with mean ?1+...+?k

Poisson process

with rate ? per unit time is process that satisfies:
1) the number of arrivals in every FIXED INTERVAL of time of length t has a Poisson distribution for which mean is ?t
2) number of arrivals in every TWO DISJOINT time intervals are independent

Uniform Distribution

X has uniform distribution on [a,b] if its pdf is f(x|a,b)={1/(b-a) for a?x?b and 0 otherwise}

Properties of Uniform Distribution

1) E(X)=(a+b)/2
2) V(X)=(b-a)�/12
3) M(t)= (e^tb - e^ta) / t(b-a)

Gamma Distribution

if X has continuous distribution f(x|?,?)={(?^?/?(?))x^(?-1)e^(-?x) for X>0 and 0 otherwise}

gamma function

?(?)=?0,? x^(?-1)e^(-x)dx; ?(1)=1

Properties of Gamma Distributions

1) E(X?) = [?(?+1)...(?+k-1)]/?^k so E(X)=?/?
2) Var(X) = ?(?-1)/?^2 - (?/?)^2 = ?/?^2
3) For t<?, M(t)=(?/(?-1))^?
4) If X1,....,Xn and independent, then sum X1+...+Xn has gamma distribution with parameters ?=?1+...+?n and ?

Exponential Distribution

gamma when ?=1; if X is continuous RV, f(x|?)= {?e^(-?x) when x>0 and 0 otherwise}

Properties of Exponential Distributions

1) E(X) = 1/?
2) V(X) = 1/?^2
3) for t<?, M(t)=?/(?-t) = (1-t/?)^-1

Exponential Distribution for P(X?t) =

e^(-?t)

memoryless property of exponential distribution

P(X?t+h|X?t) = P(X?h)

Normal Distribution

f(x|?,?) = 1