random variable
-a random variable assigns a number to each outcome of a random circumstance. equivalently, a random variable assigns a number to each unit in a population
family of random variables
consists of all random variables for which the same formula is used to find probabilities
classes of random variables
- a discrete random variable can take one of a countable list of distinct values
-a continuous random variable can take any value in an interval or collection of intervals
-discrete variables have a finite number of outcomes, but an infinite number of out
families within the two broad classes
-variables within a family share the same structure and general rules for finding probabilities
-binomial random variables -> for counting how often a particular event happens in a number of independent tries of a random circumstance
probabilities for discrete and continuous variables
-probabilities are are specified differently for the two variables
-for discrete random variables we can find probabilities for exact outcomes
-for continuous random variables we cannot find probabilities for exact outcomes instead we are limited to findi
discrete random variables
-X= the random variable
-k= a specified number the discrete random variable could assume
-P(X=k) is the probability that X equals k
ex. we are interested in the probability that there will be 2 girls born in the next 3 births (the probability is 3/8)
X= n
probability distribution function
-the probability distribution function for a discrete random variable X is a table or rule that assigns probabilities to the possible values of the random variable X
-the word function can mean either a table or a formula
-the probabilities in the table a
conditions for probabilities for discrete random variables
two conditions must always apply to the probabilities for discrete random variables
condition 1- the sum of the probabilities over all possible values must equal 1
condition 2- the probability of any specific outcome for a discrete random variable must be
graphing
-the possible outcome values are placed on the horizontal axis, and their probabilities are placed on the vertical axis
-a bar is drawn centered on each possible value, with the height of the bar equal to the probability for the value
cumulative distribution
-the cumulative distribution function for a random variable X is a table of rule that provides the probabilities P(X<k) for any real number k. generally the term cumulative probability that X is less than or equal to a particular value
-is the probability
using the sample space to find probability for discrete random variables
1 possible outcomes
2 the number of girls
3 the posibility that that will be the outcome
4 P(X=0) the odds there will be no girls and there is only 1 set that has the possibility of
BBB BBG BGB GBB BGG GBG GGB GGG =8
0 1 1 1 2 2 2 3
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
P(X=0) 1/8
P(X=1) 3/8
P(X=2) 3/8
P(X=3) 1/8
-there are 8 possible outcomes because there are 3 trails and 2 possible outcomes
2 X 2 X 2= 2^3= 8
cumulative distribution function (cdf)
for a random variable X is a table of rule that provides the probabilities P(X<k) for any real number k
cumulative probability
refers to the probability that X is less than or equal to a particular value
ex. the probability of getting 6 or fewer answers right when guessing at answers to ten (true or false)
-P(X<k) (less than or equal to) tells us how much probability X has accumu
sample space and simple event
the possible outcomes listed and a simple event is just picking one of those outcomes
mean value of the random variable
-long run average is called the expected value of the random variable
-it is the mean value of the random variable
expected value
-the expected value of a random variable X is the mean value of the variable in the sample space of population of possible outcomes. expected value can also be interpreted as the mean value that would be obtained from an infinite number of observations of
notation and formula for expected value
E(X) = the mean or expected value of a random variable X
the Greek letter mu can also represent the mean or expected value
standard deviation for a discrete random variable
-standard deviation is a measure of spread for a quantitative variable and the use it in the empirical rule
-standard deviation of a discrete random variable quantifies how spread out the possible values of discrete random variable might be, weighted by h
expected net gain
(net gain X probability) for each row and then add up each value
-taking the square root of the variance provides the standard deviation of the net gain plan
[(X-?)^2]p for each row and then add up each value
-after using this in the example we were able
variance
1. find the mean
2. for each data value subtract the mean
3. square each or these numbers
4. add up the values from step 3 and then divide the sum by the total number of value points - 1 (n-1)
-the square of standard deviation s^2
?
population standard deviation
expected value (mean) and standard deviation for a population
(for a population)
we can calculate the mean and standard deviation in 2 ways
1. we can create a probability distribution for the measurements and use he definition of expected values and standard deviation
2. we can fond the mean and standard deviation b
?
population mean
binomial random variable
-a form of a discrete random variable
-a binomial random variable is a count of how many times an event occurs in a particular number of independent observations or trails that make up a random circumstance
binomial random variables
binomial random variables-is defined as X= number of times an event occurs in the n trails of a binomial experiment
binomial experiments conditions
1. there are n trails (this can not be random but established in advance)
2. the outcomes are either a success or a failure (S/F)
3.the outcomes are independent from one rail tot he next
4. the probability of a success remains the same from one trail to t
additional features of a binomial random variable
(keep these points in mind when trying to determine whether or not a random variable fits the binomial description)
-there may be more than two possible simple events for each trail
ex. rolling a 4 or 6 is a success
-sample surveys (not whole population)
finding probability for binomial random variables
for a binomial random variable, the probabilities for the possible values of X are given by the formula
*
the formula for P(X=k) is made up of two parts
1. n!/[k!(n-k)!]gives the number of simple events in the sample space (number of outcomes: success and
binomial distribution example
# of H's from flipping a coin 5 times
possible outcomes 2^5=32
P(X=0) (the probability of 0 H's) is 1/32
n!/ k! (n-k)! -> 5!/ 0! (5-0)! = 5!/5! = 1
using the formula to get the answer 1 is consistent with the one from 1/32
P(X=1) (the probability of 1 H's
greater than or equal to
what is the probability of getting more then 10 answers correct? =
P(X>10)
X is the binomial random variable
n=15 (out of 15 questions)
P(X>10) the compliment of the odds of getting greater then 10 is getting 9 or fewer right
another example
true or false questions and there are 15
the likelihood of getting 7 correct by guessing
7-> 15!/7!(8!) = 6435 / 2^15 = .196
using the expected mean 15X.5=7.5 (predicted mean)
expected value mean and standard deviation for a binomial random variable
the mean value for a binomial
random variable is ?= np
n= number of trials
p= probability of success
ex. if you flip a coin 100 times the expected results is that you will flip heads 50 times (np = 100X.5)
(expected) standard deviation
?= ? [np(1-p)]
ex.
continuous random variable
-the outcome can be any value in an interval or collection of intervals
-all numbers are rounded
-we use intervals because the probability of a number falling on a large decimal that is exact is 0
-unlike discrete random variables continuous random variab
the probability density function
for a continuous random variable X is a curve such that the area under the curve over an interval equals the probability that X is in that interval in other words the probability that X is between the values a and b is the area under the density curve ove
notion for probability in an interval
-the 2 endpoints are represented using the letters a and b
-the interval of value of X that falls between a and b including the two endpoints is written as (a?X?b)
-the probability that X has a value between a and b is written P(a?X?b)
example
-the bus arrives every 10 minutes
-how long will someone have to wait for the bus
-X= wait until the next bus arrives
-the value of X can be anywhere between 0 and 10
-X is a continuous random variable
-possible wait time goes along the horizontal axis an
uniform random variables
-the example above shows a strait line -> a uniformity to the density
-every interval with the same width has the same probability -> a random variable with this property is a uniform random variable
-this is the simplest example of a continuous random va
example of uniform random variable
ex probability that you be waiting between 5 and 7 minutes
-probability P(a?X?b) is the area under the density curve
-the area under the curve (area of the rectangle) that has the width of 7-5 which is 2 minute and has the height of .1
-the area is 2X.1=.
normal random variables
-most continuous random variables are normal random variables
-normal random variables has a specific form of a bell-shaped probability density curve called a normal curve
-a normal random variable is also called a normal distribution
-these values are ch
features of normal curves and normal random variables
-with any continuous random variable the proability that a normal variable falls into a specified interval is equivalent to an area under its density curve
-P(X=k) = 0 the probability does not equal a specific value
-these are properties of continuous ran
normal curve example
-the distribution of height of college women fits a normal curve
-the mean is 65 inches
the standard deviation is 2.7 inches
-heights are on the horizontal axis
-half the data is above the men and half below
-there is a tick mark for the mean and then 1,2
useful probability relationships for normal random variables
-cumulative probability is the probability that a random variable is less than of equal to a specific value
-the figure shows the probability for a normal random variable ****
-the probability is the area under the normal curve to the left of a specific v
cumulative probabilities
it helps to draw out the pictures and highlight the desired area
-are calculated using a system such as minitab or a table
-the following three rules are useful for using cumulative probabilities to find other types of probabilities for a normal random variable X
rule 1: P(X>a) = 1 - P(X?a)
rule 2: P(a<X<b) = P(X?b) -
cumulative probability example
mean = 515
standard deviation = 100
-math SAT scores
question 1-> what is the probability that a randomly selected test-taker had a score less than or equal to 600? said another way, what is the cumulative probability for a score of 600?
answer-> .8023 wh
using a table to find probabilities for a normal random variable
-a normal random variable with mean = 0 and a standard deviation =1 is said to be a standard normal random variable and to have a standard normal distribution
-converting a normal variable to a z score is the same as converting a random variable of intere
calculating a standardized score
-a standardized score is also referred to as the z-score
-this is the distance between a specified value and the mean (which is measured in the number of standard deviations)
the formula for converting any value x to a z-score
z= (value - mean)/(standard deviation) x-?/?
-where ? and ? are the mean and the standard deviation, respectively, for the random variable X
-a z-score measures the number of standard deviations that a value falls from the mean
finding a cumulative probability P(X?a) for any normal random variable
if you are given the mean and standard deviation of a normal random variable (X) there are two steps to finding P(X?a) (the probability that X is less than or equal to the value a)
-step 1: calculate the z-score for the value a
-step 2: use a table (such
finding percentiles
percentile-> refers to the value of a variable
percentile rank -> corresponds to the cumulative probability (the area to the left under the density curve) for that value
-suppose the 25th percentile of pulse rates for adult males is 64 beats per minute
-t
finding a percentile or a value with specified cumulative probability
-there are 2 steps to finding a percentile for a normal random variable
-step 1: find the value z* that has the specified cumulative probability. this does not involve calculations. in the body of the table find the specified cumulative probability (or th
approximating binomial distribution probabilities
-when X has a binomial distribution with large number of trails, the binomial probability formula is difficult to use because the factorial expression in the become very large
normal approximation to the binomial distribution
the normal approximation to the binomial distribution is based on the following results, derived mathematically. if X is a binomial random variable based on n trials with success probability p, and n is sufficiently large, then X is also approximately a n
example
n=60 trails and the success probability p=.5
X = the number of heads observed when you flip a coin 60 times
-there is a bell shaped pattern of distribution
-a normal curve could be used to approximate this distribution because both np and n(1-p) are great
approximating cumulative probabilities for a binomial random variable
-the normal curve approximation can be used to find cumulative probabilities for binomial random variables
P(X?k) = P(Z?z*)
where
z*=(k-np)/?(np(1-p))
1. calculate a z-score for the value of interest k
2. use the standard normal curve to determine the cum
example
-you need to get 21 out of 30 on a test or 70%
-the questions are true or false and you guess every answer
-we wish to find P(X ? 21) for a binomial random variable with n = 30 and p = .5
-the number of correct answers is discrete so the complement of thi
normal approximation for a binomial random variable
-when both np and n(1-p) are at least 10, a binomial random varialbe based on n trials with success probability p can be approximated by a normal random variable with mean ? = np standard deviation ? = ?np(1-p)
sums, differences and combinations of random variables
ex. 25% of the first 2 exams and 50% of the last
.25exam1 + .25exam2 + .50exam3
-if you know the mean for the exam it is easy to calculate the mean final score
-.25X? + .25X? + .50X?
-if the means were 72, 76, and 80
(.25X72) + (.25X76) + (.50X80)
18+19+4
linear combinations of random variables
linear combination of random variables-> the combination of exam scores that we just considered
-in a linear combination we add and subtract variables -some of the combined variables may be multiplied by a numerical value -> as occurred in calculating the
a linear combination of random variables
X,Y... is a combination of the form
L = aX + bY
where a,b, and so on are numbers that could be positive or negative. the most commonly encountered linear combination of 2 variables are sum = X+Y and difference = X-Y
mean and standard deviation for linear combinations
this rule applies to discrete and continuous variables (independent or not and regardless of their distribution)
-the assumption is that the variables that are combined all have finite means
L = aX + bY
the mean of L (which I think is the mean of a linear
statistically independent
-two variables are statistically independent if the probability for any event associated with one random variable is not altered by whether or not any particular event for the other random variable has happened
-in a more practical sense that -> two rando
variance and standard deviations of a linear combination of independent random variables
is X,Y, are independent variables, a, b, and so on are numbers, and
L=aX + bY + ...
then the variance and standard deviation of L are
variance (L) = a^2 Variance (X) + b^2 variance (Y) +...
standard deviation of L = ?variance (L)
in particular
variance (X
chapter 8 random
-knowing the mean and standard deviation of a random variable has little practical use
-we also need the distribution to find probabilities associated with various outcomes
combining independent normal random variables
-linear combinations of independent normal random variables
-if X, Y, etc are independent, normally distributed random variables and a,b, ... are numbers, either positive or negative, then the random variable L = aX + bY + ... is normally distributed.
X+Y
example
example 2
-meg leaves 45 minutes before the flight leaves
-travel time is normally distributed with a mean of 25 and a standard deviation of 3 minutes
-the security mean time is 15 minutes and has a standard deviation of 2 minutes (this time is also normally distri
combining independent binomial random variables
-a linear combination of binomial random variables is generally not a binomial variable
-but there is one situation where it is
-when each independent binomial variable has the same success probability , the sum has a binomial distribution
-in other words
adding binomial random variables with the same success probability
if X,Y, ... are independent binomial random variables with nx, ny, trials and all have the same success probability p then the sum X+Y... is a binomial random variable with n= nx+ny... and success p
-if the success probabilities differ the sum is not a bi