STAT chapter 8 notes

random variable

-a random variable assigns a number to each outcome of a random circumstance. equivalently, a random variable assigns a number to each unit in a population

family of random variables

consists of all random variables for which the same formula is used to find probabilities

classes of random variables

- a discrete random variable can take one of a countable list of distinct values
-a continuous random variable can take any value in an interval or collection of intervals
-discrete variables have a finite number of outcomes, but an infinite number of out

families within the two broad classes

-variables within a family share the same structure and general rules for finding probabilities
-binomial random variables -> for counting how often a particular event happens in a number of independent tries of a random circumstance

probabilities for discrete and continuous variables

-probabilities are are specified differently for the two variables
-for discrete random variables we can find probabilities for exact outcomes
-for continuous random variables we cannot find probabilities for exact outcomes instead we are limited to findi

discrete random variables

-X= the random variable
-k= a specified number the discrete random variable could assume
-P(X=k) is the probability that X equals k
ex. we are interested in the probability that there will be 2 girls born in the next 3 births (the probability is 3/8)
X= n

probability distribution function

-the probability distribution function for a discrete random variable X is a table or rule that assigns probabilities to the possible values of the random variable X
-the word function can mean either a table or a formula
-the probabilities in the table a

conditions for probabilities for discrete random variables

two conditions must always apply to the probabilities for discrete random variables
condition 1- the sum of the probabilities over all possible values must equal 1
condition 2- the probability of any specific outcome for a discrete random variable must be

graphing

-the possible outcome values are placed on the horizontal axis, and their probabilities are placed on the vertical axis
-a bar is drawn centered on each possible value, with the height of the bar equal to the probability for the value

cumulative distribution

-the cumulative distribution function for a random variable X is a table of rule that provides the probabilities P(X<k) for any real number k. generally the term cumulative probability that X is less than or equal to a particular value
-is the probability

using the sample space to find probability for discrete random variables
1 possible outcomes
2 the number of girls
3 the posibility that that will be the outcome
4 P(X=0) the odds there will be no girls and there is only 1 set that has the possibility of

BBB BBG BGB GBB BGG GBG GGB GGG =8
0 1 1 1 2 2 2 3
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
P(X=0) 1/8
P(X=1) 3/8
P(X=2) 3/8
P(X=3) 1/8
-there are 8 possible outcomes because there are 3 trails and 2 possible outcomes
2 X 2 X 2= 2^3= 8

cumulative distribution function (cdf)

for a random variable X is a table of rule that provides the probabilities P(X<k) for any real number k

cumulative probability

refers to the probability that X is less than or equal to a particular value
ex. the probability of getting 6 or fewer answers right when guessing at answers to ten (true or false)
-P(X<k) (less than or equal to) tells us how much probability X has accumu

sample space and simple event

the possible outcomes listed and a simple event is just picking one of those outcomes

mean value of the random variable

-long run average is called the expected value of the random variable
-it is the mean value of the random variable

expected value

-the expected value of a random variable X is the mean value of the variable in the sample space of population of possible outcomes. expected value can also be interpreted as the mean value that would be obtained from an infinite number of observations of

notation and formula for expected value

E(X) = the mean or expected value of a random variable X
the Greek letter mu can also represent the mean or expected value

standard deviation for a discrete random variable

-standard deviation is a measure of spread for a quantitative variable and the use it in the empirical rule
-standard deviation of a discrete random variable quantifies how spread out the possible values of discrete random variable might be, weighted by h

expected net gain

(net gain X probability) for each row and then add up each value
-taking the square root of the variance provides the standard deviation of the net gain plan
[(X-?)^2]p for each row and then add up each value
-after using this in the example we were able

variance

1. find the mean
2. for each data value subtract the mean
3. square each or these numbers
4. add up the values from step 3 and then divide the sum by the total number of value points - 1 (n-1)
-the square of standard deviation s^2

?

population standard deviation

expected value (mean) and standard deviation for a population

(for a population)
we can calculate the mean and standard deviation in 2 ways
1. we can create a probability distribution for the measurements and use he definition of expected values and standard deviation
2. we can fond the mean and standard deviation b

?

population mean

binomial random variable

-a form of a discrete random variable
-a binomial random variable is a count of how many times an event occurs in a particular number of independent observations or trails that make up a random circumstance

binomial random variables

binomial random variables-is defined as X= number of times an event occurs in the n trails of a binomial experiment

binomial experiments conditions

1. there are n trails (this can not be random but established in advance)
2. the outcomes are either a success or a failure (S/F)
3.the outcomes are independent from one rail tot he next
4. the probability of a success remains the same from one trail to t

additional features of a binomial random variable

(keep these points in mind when trying to determine whether or not a random variable fits the binomial description)
-there may be more than two possible simple events for each trail
ex. rolling a 4 or 6 is a success
-sample surveys (not whole population)

finding probability for binomial random variables

for a binomial random variable, the probabilities for the possible values of X are given by the formula
*
the formula for P(X=k) is made up of two parts
1. n!/[k!(n-k)!]gives the number of simple events in the sample space (number of outcomes: success and

binomial distribution example

# of H's from flipping a coin 5 times
possible outcomes 2^5=32
P(X=0) (the probability of 0 H's) is 1/32
n!/ k! (n-k)! -> 5!/ 0! (5-0)! = 5!/5! = 1
using the formula to get the answer 1 is consistent with the one from 1/32
P(X=1) (the probability of 1 H's

greater than or equal to

what is the probability of getting more then 10 answers correct? =
P(X>10)
X is the binomial random variable
n=15 (out of 15 questions)
P(X>10) the compliment of the odds of getting greater then 10 is getting 9 or fewer right

another example

true or false questions and there are 15
the likelihood of getting 7 correct by guessing
7-> 15!/7!(8!) = 6435 / 2^15 = .196
using the expected mean 15X.5=7.5 (predicted mean)

expected value mean and standard deviation for a binomial random variable

the mean value for a binomial
random variable is ?= np
n= number of trials
p= probability of success
ex. if you flip a coin 100 times the expected results is that you will flip heads 50 times (np = 100X.5)
(expected) standard deviation
?= ? [np(1-p)]
ex.

continuous random variable

-the outcome can be any value in an interval or collection of intervals
-all numbers are rounded
-we use intervals because the probability of a number falling on a large decimal that is exact is 0
-unlike discrete random variables continuous random variab

the probability density function

for a continuous random variable X is a curve such that the area under the curve over an interval equals the probability that X is in that interval in other words the probability that X is between the values a and b is the area under the density curve ove

notion for probability in an interval

-the 2 endpoints are represented using the letters a and b
-the interval of value of X that falls between a and b including the two endpoints is written as (a?X?b)
-the probability that X has a value between a and b is written P(a?X?b)

example

-the bus arrives every 10 minutes
-how long will someone have to wait for the bus
-X= wait until the next bus arrives
-the value of X can be anywhere between 0 and 10
-X is a continuous random variable
-possible wait time goes along the horizontal axis an

uniform random variables

-the example above shows a strait line -> a uniformity to the density
-every interval with the same width has the same probability -> a random variable with this property is a uniform random variable
-this is the simplest example of a continuous random va

example of uniform random variable

ex probability that you be waiting between 5 and 7 minutes
-probability P(a?X?b) is the area under the density curve
-the area under the curve (area of the rectangle) that has the width of 7-5 which is 2 minute and has the height of .1
-the area is 2X.1=.

normal random variables

-most continuous random variables are normal random variables
-normal random variables has a specific form of a bell-shaped probability density curve called a normal curve
-a normal random variable is also called a normal distribution
-these values are ch

features of normal curves and normal random variables

-with any continuous random variable the proability that a normal variable falls into a specified interval is equivalent to an area under its density curve
-P(X=k) = 0 the probability does not equal a specific value
-these are properties of continuous ran

normal curve example

-the distribution of height of college women fits a normal curve
-the mean is 65 inches
the standard deviation is 2.7 inches
-heights are on the horizontal axis
-half the data is above the men and half below
-there is a tick mark for the mean and then 1,2

useful probability relationships for normal random variables

-cumulative probability is the probability that a random variable is less than of equal to a specific value
-the figure shows the probability for a normal random variable ****
-the probability is the area under the normal curve to the left of a specific v

cumulative probabilities
it helps to draw out the pictures and highlight the desired area

-are calculated using a system such as minitab or a table
-the following three rules are useful for using cumulative probabilities to find other types of probabilities for a normal random variable X
rule 1: P(X>a) = 1 - P(X?a)
rule 2: P(a<X<b) = P(X?b) -

cumulative probability example

mean = 515
standard deviation = 100
-math SAT scores
question 1-> what is the probability that a randomly selected test-taker had a score less than or equal to 600? said another way, what is the cumulative probability for a score of 600?
answer-> .8023 wh

using a table to find probabilities for a normal random variable

-a normal random variable with mean = 0 and a standard deviation =1 is said to be a standard normal random variable and to have a standard normal distribution
-converting a normal variable to a z score is the same as converting a random variable of intere

calculating a standardized score

-a standardized score is also referred to as the z-score
-this is the distance between a specified value and the mean (which is measured in the number of standard deviations)

the formula for converting any value x to a z-score

z= (value - mean)/(standard deviation) x-?/?
-where ? and ? are the mean and the standard deviation, respectively, for the random variable X
-a z-score measures the number of standard deviations that a value falls from the mean

finding a cumulative probability P(X?a) for any normal random variable

if you are given the mean and standard deviation of a normal random variable (X) there are two steps to finding P(X?a) (the probability that X is less than or equal to the value a)
-step 1: calculate the z-score for the value a
-step 2: use a table (such

finding percentiles

percentile-> refers to the value of a variable
percentile rank -> corresponds to the cumulative probability (the area to the left under the density curve) for that value
-suppose the 25th percentile of pulse rates for adult males is 64 beats per minute
-t

finding a percentile or a value with specified cumulative probability

-there are 2 steps to finding a percentile for a normal random variable
-step 1: find the value z* that has the specified cumulative probability. this does not involve calculations. in the body of the table find the specified cumulative probability (or th

approximating binomial distribution probabilities

-when X has a binomial distribution with large number of trails, the binomial probability formula is difficult to use because the factorial expression in the become very large

normal approximation to the binomial distribution

the normal approximation to the binomial distribution is based on the following results, derived mathematically. if X is a binomial random variable based on n trials with success probability p, and n is sufficiently large, then X is also approximately a n

example

n=60 trails and the success probability p=.5
X = the number of heads observed when you flip a coin 60 times
-there is a bell shaped pattern of distribution
-a normal curve could be used to approximate this distribution because both np and n(1-p) are great

approximating cumulative probabilities for a binomial random variable

-the normal curve approximation can be used to find cumulative probabilities for binomial random variables
P(X?k) = P(Z?z*)
where
z*=(k-np)/?(np(1-p))
1. calculate a z-score for the value of interest k
2. use the standard normal curve to determine the cum

example

-you need to get 21 out of 30 on a test or 70%
-the questions are true or false and you guess every answer
-we wish to find P(X ? 21) for a binomial random variable with n = 30 and p = .5
-the number of correct answers is discrete so the complement of thi

normal approximation for a binomial random variable

-when both np and n(1-p) are at least 10, a binomial random varialbe based on n trials with success probability p can be approximated by a normal random variable with mean ? = np standard deviation ? = ?np(1-p)

sums, differences and combinations of random variables

ex. 25% of the first 2 exams and 50% of the last
.25exam1 + .25exam2 + .50exam3
-if you know the mean for the exam it is easy to calculate the mean final score
-.25X? + .25X? + .50X?
-if the means were 72, 76, and 80
(.25X72) + (.25X76) + (.50X80)
18+19+4

linear combinations of random variables

linear combination of random variables-> the combination of exam scores that we just considered
-in a linear combination we add and subtract variables -some of the combined variables may be multiplied by a numerical value -> as occurred in calculating the

a linear combination of random variables

X,Y... is a combination of the form
L = aX + bY
where a,b, and so on are numbers that could be positive or negative. the most commonly encountered linear combination of 2 variables are sum = X+Y and difference = X-Y

mean and standard deviation for linear combinations

this rule applies to discrete and continuous variables (independent or not and regardless of their distribution)
-the assumption is that the variables that are combined all have finite means
L = aX + bY
the mean of L (which I think is the mean of a linear

statistically independent

-two variables are statistically independent if the probability for any event associated with one random variable is not altered by whether or not any particular event for the other random variable has happened
-in a more practical sense that -> two rando

variance and standard deviations of a linear combination of independent random variables

is X,Y, are independent variables, a, b, and so on are numbers, and
L=aX + bY + ...
then the variance and standard deviation of L are
variance (L) = a^2 Variance (X) + b^2 variance (Y) +...
standard deviation of L = ?variance (L)
in particular
variance (X

chapter 8 random

-knowing the mean and standard deviation of a random variable has little practical use
-we also need the distribution to find probabilities associated with various outcomes

combining independent normal random variables

-linear combinations of independent normal random variables
-if X, Y, etc are independent, normally distributed random variables and a,b, ... are numbers, either positive or negative, then the random variable L = aX + bY + ... is normally distributed.
X+Y

example
example 2

-meg leaves 45 minutes before the flight leaves
-travel time is normally distributed with a mean of 25 and a standard deviation of 3 minutes
-the security mean time is 15 minutes and has a standard deviation of 2 minutes (this time is also normally distri

combining independent binomial random variables

-a linear combination of binomial random variables is generally not a binomial variable
-but there is one situation where it is
-when each independent binomial variable has the same success probability , the sum has a binomial distribution
-in other words

adding binomial random variables with the same success probability

if X,Y, ... are independent binomial random variables with nx, ny, trials and all have the same success probability p then the sum X+Y... is a binomial random variable with n= nx+ny... and success p
-if the success probabilities differ the sum is not a bi