# Chapter 5 Biostats

Probability

numbers that reflect the likelihood that a particular event occurs.

Two Areas of Statistics

Descriptive Stats to Statistical Inference
Sample Population

Sampling Frame

Complete list of entire population

Probability Sampling

Each member of the population has a known probability of being selected.

Two Types of Sampling

Probability and Non-probability

Types of Probability Sampling

Simple, Systematic, Stratified

Types of Non-Probability Sampling

Convenience and Quota

Simple Random Sample

Enumerate all members of population N (sampling frame), select n individuals at random (each has same probability of being selected).The probability that any individual is selected into the sample is 1/N

Systematic sample

Start with sampling frame; determine sampling interval (N/n); select first person at random from first (N/n) and every (N/n) thereafter (list)
If the population size is 1,000 and we need a sample size of 100 then the sampling interval is 1,000/10= 10; we

Stratified sample

Organize population into mutually exclusive strata; select individuals at random within each stratum
Age in POPULATION: Under age 20 = 30%
age 20-49 = 40%
age 50+ = 30%

Convenience sample

Non-probability sample (not for inference)

Quota sample

Select a pre-determined number of individuals into sample from groups of interest (non-random)

Conditional Probability

Probability of outcome in a specific sub-population

Sensitivity

true positive fraction
= P(test +|disease)

Specificity

true negative fraction
= P(test -|disease free)

False negative fraction

P(test -|disease)
(1- sensitivity)

False positive fraction

P(test +|disease free)
(1-specificity)

Independence

P(A|B) = P(A) or if P(B|A) = P(B)

Two types of Probability

Conditional : Sensitivity and Specificity
Bayes Theorem

Binomial Distribution

Allows us to compute the probability of observing a specified number of success when the process is repeated a specific number of times
(dichotomous variables) Success or failure
Replications of process are independent
P(success) is constant for each repl

Normal Distribution

Model for continuous outcome
The mean and variance, ? and ?2, completely characterize the normal distribution.

Properties of Normal Distribution

Mean=median=mode
P(? - ? < X < ? + ?) = 0.68,
P(? - 2? < X < ? + 2?) = 0.95,
P(? - 3? < X < ? + 3?) = 0.99
The normal distribution is symmetric about the mean
P(a < X < b) = the area under the normal curve from a to b.

kth percentile

the score that holds k percent of the scores below it.

Central Limit Theorem

Non-normal population
Take samples of size n - as long as n is sufficiently large (usually n > 30 suffices)
The distribution of the sample mean is approximately normal, therefore can use Z to compute probabilities