The Practice of Statistics - Chapter 6

Random Variables and Probability Distribution of Random Variables

RVs: take numerical values that describe the outcomes of some chance process
Prob. Dist. of RV: gives its possible values and their probabilities
Two Main Types: discrete and continuous

Discrete Random Variables

Def: takes a fixed set of possible values w/ gaps in b/w. The prob dist of a discrete random var. X lists the values x_i and their probs p_i
To find prob. of any event, add probs of pi of particular values of X that make up the event.
Summary: Assign prob

Mean (Expected Value) of a Discrete RV

Def:
-A long-run average we'd expect after many iterations.
-A weighted average where outcomes are weighted by prob.
?x = E(x) = ?XiPi

Variance of Discrete RV

?� = ? (Xi - ?x)�*Pi
Take sqrt for ?
On average, each [outcome] differs from ?x by ?x.

Analyzing RVs on Calculator

1. enter values of X in L1
enter corresp probs in L2
2. set up stat plot with Xlist: L1 and Freq: L2
xmin = -1, xmax = 11, xscale = 1
ymin = -.1, ymax = 0.5, yscale = 0.1
3. for ?x and ?x, use 1-var stats L1, L2

Continuous Random Variables

Def: a cont. RV X takes all values in an interval of #s. The probability distribution of X is described by a density curve. The prob of any event is the area under the density curve and above the values of X that make up the event.
-> assigns probs not to

Effect of Linear Transformations on ?x and ?x

If Y = a + bX, then
- ?y = a + b?x
- ?y = |b|?x [a has no effect on spread]
- ?y� = b�?x�
Adding/Subtracting adds a to measures of center and location (mean, median, quartiles, %iles) but does not change shape or measures of spread (?)
Multiplying/Dividin

Adding Random Variables

if T = X+Y, then
E(D) = ?t = ?x + ?y
Can't calculate probability for any value of T unless X and Y are independent random variables: knowing whether X occurred tells us nothing about occurrence of Y
P(Ti) = P(Xi)*P(Yi)
?t� = ?x� + ?y� } don't add SDs
rang

Subtracting Random Variables

If D = X-Y, then
E(D) = ?d = ?x - ?y
if X and Y are independent
?d� = ?x� + ?y� } don't add SDs
Range of D = range of X + range of Y

Binomial Setting

Arises when we perform several independent trials of the same chance process and record the # of times that occurs.
Conditions:
1. binary? "success" or "failure"
2. independent? knowing result of 1 trial doesn't affect result of another
3. number? # of tr

Binomial Random Variable/Binomial Distribution

The count of successes X in a binomial setting is a binomial (discrete) RV.
The prob dist of X is a binomial dist with parameters n and p.
n = # of trials and p = prob of success in a trial
X must be whole #s from 0 to n

Binomial Coefficient

# of ways to arrange k successes among n observations
(n k) = n!/(k!(n-k)!)

Binomial Probability

if X has a binomial distribution with n trials and prob p of success
P(X = k) = (n k) p^k * (1-p)^(n-k)
P(X = k) = BinomPDF(n, p, k)
P(X <= k) = BinomCDF(n, p, k)

Describing Binomial Distribution

1. shape: describe skew + interpret
ex: skewed right. higher values of X are less common.
2. center: median ~ look for where 50th %ile falls in prob dist.
calculate mean -> ?x = np
*mean is pulled in direction of skew
3. spread: compute ?x� and ?x
?x� = n

Sampling w/o Replacement Condition

-> ASK: are we sampling w/o replacement?
Def: when taking SRS of size n from a pop of size N, we can use a binomial dist to model the count of successes in the same as long as
n <= (1/10) * N

Normal Approximations for Binomial Distributions

Suppose that a count X has binomial distribution. If # of trials n is very large, the dist. of X is approximately normal w/
?x� = np(1-p) and ?x = np
Use when
np >= 10 and n(1-p) >=10
Then, standardize and find area using table A

Geometric Setting

Def: arises when we perform independent trails of same chance process and record # of trials up to and including when particular outcome occurs.
-> Repeat a process until a success occurs
Conditions:
1. binary? "success" or "failure"
2. independent? knowi

Geometric RV

# of trials Y that it takes to get a success in a geometric setting

Geometric Distribution

Prob dist of Y, w/ parameter p (prob of success on trial)
Y can be whole numbers

Geometric Probability

If Y has the geometric dist w/ prob p of success on each trial,
P(Y=k) = p * (1-p)^(k-1)
P(Y=k) = geometPDF(p,k)
P(Y<=k) = geometCDF(p,k)

Mean (Expected Value) of Geometric RV

If Y is a geometric RV w/ prob p on ea trial, then its mean is
?y = 1/p
The expected # of trials req'd to get to 1st success is 1/p.