population
the entire group of individuals about which we want information
sample
part of the population from which we actually collect information. we use information from a sample to draw conclusions about the entire population.
the first step in planning a sample survey
to say what population we want to describe
second step in planning a sample survey
to say exactly what we want to measure
final step in planning a sample survey
to decide how to choose a sample from the population
convenience sample
choosing individuals who are easiest to reach, often produce unrepresentative data
the design of a statistical study shows bias if
if systematically favors certain outcomes
voluntary response sample
consists of people who choose thenselves by responding to a general appeal. voluntary response samples show bias because people with strong opinions (often in the same direction) are most likely to respond.
simple random sample
consists of x individuals from the population chosen in such a way that every set of x individuals has an equal chance to be the sample actually selected
table of random digits
a long string of digits 0-9 with these properties:
-each entry in the table is equally likely to be any of the 10 digits 0-9
-the entries are independent of each other. that is, konwledge of one part of the table gives no information about any other part
how to use an SRS using Table D
1. Label: give each member of the population a numerical label of the same length
2. Table: read consecutive groups of digits of the appropriate length from Table D
to select a stratified random sample
first classify the population into groups of similar individuals, called strata. then choose a separate SRS un each stratum and combine rhese SRSs to form the full sample
if the individuals in each stratum are less varied than the population as a whole
a stratified random sample can produce better information about the population than an SRS of the same size
to take a cluster sample
first divide the population into smaller groups. ideally, these clusters should mirror the characteristics of the population. then choose an SRS of the clusters. all individuals in the chosen clusters are included in the sample.
inference
the process of drawing conclusions about a populations on the basis of sample data
the first reason to rely on random sampling
to eliminate bias in selecting samples from the list of available individuals
the second reason to use random sampling
the laws of probability allow trustworthy inference about the population
margin of error
tells us how much sampling variability to expect
sampling errors
mistakes made in the process of taking a sample that could lead to inaccurate informatino about the population
sampling frame
list of individuals from which we will draw our sample, ideally it should list every individual in the population
undercoverage
occurs when some groups in the population are left out of the process of choosing a sample
ex: a sample survey of households will miss homeless people, prison inmates, and college students
nonresponse
occurs when an indivual chosen for the sample can't be contacted or refuses to participate
response bias
systematic pattern of incorrect responses in a sample survey
ex: Calvin says that he spends $500 a week on bubble gum
most important influence on the answers given to a sample survey
wording of questions
observational study
observes individuals and measures variables of interest but does not attempt to influence the responses
experiment
deliberately imposes some treatment on individuals to measure their responses
goal of an observational study
to describe a group or situation, to comoare groups, or to examine relationships between variables
purpose of an experiment
to determine whether the treatment causes a change in the response
when our goal is to understand cause and effect, __________ are the only source of fully convincing data
experiments
lurking variable
variable that is not among the explanatory or response variables in a study but may influence the response variable
ex: in car extending life example, amount of money is a lurking variable
confounding
occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other
with no association between the lurking variable and the explanatory variable, there can/can't be confounding?
can't be
observational studies of the effect of one variable on another often fail because of...
confounding between the explanatory variable and one or more lurking variables
treatment
a specific condition applied to the individuals in an experiment. if an experiment has several explanatory variables, a treatment is a combination of specific values of thesr variables.
experimental units
smallest collection of indivuals to which treatments are applied
subjects
human being units
factors
another name for explanatory variables
multifactor experient
each treatment is formed by combining a specific value (often called a level) of each of the factors
experimental units--->treatment--->measure response
design of many laboratory experiments
badly designed experiments often yield worthless results because of
confounding
if treatments are given to groups that differ greatly when the experiment begins, ____ will result
bias
random assignment
experimental units are assigned to treatments at random, that is, using some sort of chance process
comparative design
compares two treatments
completely randomized design
treatments are assigned to all the experimental units completely by chance
primary purpose of a control group
to provide a baseline for comparing the effects of the other treatments
when can you not use a control group?
if you simply want to compare the effects of several treatments and not to determine whether any of them works better than an inactive treatment
principles of experimental design
1. CONTROL for lurking variables that might affect the response: use the comparative design and ensure that the only systematic differences between the groups is the treatment administered
2. RANDOM ASSIGNMENT: Use impersonal chance to assign experimental
placebo effect
reson to a dummy treatment
double-blind experiment
neither the subjects nor those who interact with them and measure the response variabke know which treatment a subject received
single-blind
the subjects are unaware of which treatment they are receiving, or the people interacting with them and measuring the response variable do not know
statistically significant
ann observed effect so large that it would rarely occur by chance
a statistically significant association in data from a well-designed experiment does/does not imply causation
does
block
group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments, form of control
randomized block design
the random assignment of experimental units to treatments is carried out separately within each block
matched pairs design
create blocks by matching pairs of similar experimental units and then use chance to decide which member of a pair gets which treatment
in a matched pairs design when each pair consists of one individual being treated twice, the order of treatments can/can't influence the response
can
individuals were randomly selected and assigned to groups
inference about cause and effect and the population
individuals were randomly selected but not randomly assigned to groups
inference about the population but not cause and effect
individuals were not randomly selected but randomly assigned to groups
inference about cause and effect but not the population
individuals were not randomly selected or assigned to groups
no inferences about population or cause and effect
lack of realism
limits our ability to appy the conclusions of an experiment to the settings of great interest
what are the criteria for establishing causation when we can't do an experiment?
1. the association is strong
2. the association is consistent. many studies of different kinds of people in many countries link smoking to lung cancer. that reduces the chance that a lurking variable specific to one group explains the association.
3. larg