Population
In a statistical study, the entire group of individuals we want information about (population of interest; not to be confused with population of inference)
Census
Collects data from every individual in the population
Sample
Subset of individuals in the population from which we actually collect data; collecting data from a representative sample allows us to make an inference about the population
Steps to Choosing a sample
We often draw conclusions about a whole population on the basis of a sample; in choosing a sample from a large, varied population, we must:
Step 1: Define the population we want to describe
Step 2: Say exactly what we want to measure (give exact definitio
Sample Survey
A study that uses an organized plan to choose a sample that represents some specific population
Convenience Sample
Choosing individuals from the population who are easy to reach; often produce unrepresentative data, almost guaranteed to show bias
Bias
The design of a statistical study shows this factor if it would consistently underestimate or consistently overestimate the value you want to know (over or underrepresentation of a group)
Voluntary response sample
Self-selected sample; consists of people who choose themselves by responding to a general invitation (ex: email); show bias because people with strong opinions or who feel strongly about an issue, often in the same direction, are most likely to respond. P
Random sampling
Involves using a change process to determine which members of a population are included in the sample; a sample chosen by chance rules out both favoritism by the sampler and self-selection by respondants.
Simple Random Sample (SRS)
of size n, is chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample. It gives every possible sample of the desired size an equal chance to be chosen.
Table of Random Digits
In practice, people use random numbers generated by a computer or calculator to choose samples; if technology is not available, this resource can be used
Choosing and SRS with technology
Step 1: Label. Give each individual in the population a distinct numerical label from 1-N.
Step 2: Randomize: Use a random number generator to obtain n different integers from 1-N.
Choosing an SRS with Table D
Step 1: Label. Give each member of the population a numerical label with the same number of digits. Use as few digits as possible.
Step 2: Randomize. Read consecutive groups of digits of the appropriate length from left to right across a line in Table D.
Stratified random sample
Sometimes there are statistical advantages to using more complex sampling methods. To get a ____, start by classifying the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to f
Strata
Classifications of the population into groups of similar individuals. When we choose strata that are similar within but different between, stratified random samples give more precise estimates than simple random samples of the same size.
Cluster sample
A method that selects groups of individuals that are "near;" start by classifying the population into groups of individuals that are near each other. Used for practical reasons of saving money and time (higher efficiency is the greatest benefit to this sa
Clusters
Best chosen when the different within but similar between; when samples are just like the population but on a smaller scale. More varied than stratum.
Inference
The purpose of a sample is to give us information about a larger population, the process of drawing conclusions about a population on the basis of sample data. Larger random samples give better information about the population than random samples.
Samples that do not allow inference
Convenience and voluntary response samples do not allow us to infer about the population because the sample is misleading and contains bias; therefore, it does not fairly represent the population.
Reliance on random sampling
Avoids bias in selecting samples from the list of available individuals; the laws of probability allow trustworthy inference about the population.
Margin of error
Results from random samples come with _____ that sets bounds on the size of the likely error.
4.1 Tips
Describe, fully explain, and justify full steps in processes and answers.
Things to write in a short answer asking for a sample from a table of random digits: population, sample, value you want to measure (with units), how to choose the SRS, "From table D
Unbiased samples
Will still produce estimates that differ from the value we want to know simply by chance. However, these estimates will be too small about half the time and too large the other half of the time.
Without replacement
Must explicitly state that the repeated integers should be ignored, or say that they will generate random integers until n different numbers are selected from the given range.
Best variables to chose for stratification
Those that would most accurately predict the response.
Observational study
Observes individuals and measures variables of interest but does not attempt to influence the response; therefore, we cannot determine cause and effect (this includes sample surveys). The purpose is to describe a group or situation, compare groups, and ex
Experiment
deliberately and actively imposes some treatment on individuals to measure their response. When our goal is to understand cause and effect, this is the only source of fully convincing data. Directly answers question. The purpose is to determine if a treat
Confounding
Occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. Reason why observational studies of the effect of an explanatory variable often fail; occurs between the explanatory
Response variable
Measures the outcome of a study or experiment
Explanatory variable
Helps explain or predict changes in a response variable
Experiment
A statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response
Experimental units
The smallest collection/entity of individuals to which treatment is applied (can be objects, plants, animals, humans)
Subjects
Name for experimental units when they are human beings
Treatment
A specific condition applied to the individuals in an experiment; if an experiment has several explanatory variables, a treatment is a combination of specific values of these variables.
Experiment conditions
Experiments often use a design: Experimental units -> treatment -> measure response. In the lab environment, simple designs often work well. Field experiments and experiments with animals or people deal with more variable conditions. Outside the lab, badl
Factors
Explanatory variables; each treatment is formed by combining a specific value (level) of each ___.
Experiment
Allows study of combined effects of several factors and interactions of several factors can produce effects that can not be predicted by looking at each of the factors alone
Comparative experiment
Remedy for confounding, experiment in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments
Random assignment
If treatments are given to groups that differ greatly, like self-placed groups, bias will result. We use ____ so that units are assigned to treatments using a chance process; ensures that the effects of other variables are spread evenly among the two grou
Chance
Assigns individuals to groups but will always cause some difference between the groups
Control
Prevents confounding, reduces variability in the response variable, and provides a baseline for comparison
4 Principles of experimental design
1. Comparison. Use a design that compares two or more treatments.
2. Random assignment. Use chance to assign experimental units to treatments; doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variable
Completely Randomized Design
The treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group. Using chance to assign treatments in an experiment does NOT guarantee a completely randomized design.
Control group
A group that receives an inactive treatment or an existing baseline treatment. It is okay if there is no control group when researching the comparison of the effects of several treatments, as opposed to trying to determine if any one treatment works bette
Good experiments
The logic of a randomized comparative experiment depends on our ability to treat all the subjects the same in every way accept for the actual treatments being compared; require careful attention to details to ensure that all subjects really are treated id
Placebo effect
A response to a dummy treatment; very strong, expectations bias results
Double-blind experiment
Neither the subjects nor those who interact with them and measure the response variable know which treatment a subject is administered
Single-blind
When individuals interacting with the subjects know the treatment the subjects are receiving; however, the subjects are still unaware of their treatment and/or the measured response variables
Confounding variables
Confounding occurs when existing differences in the experimental units are not taken into account; different variables might systematically affect the response to treatments
Statistically Significant
In an experiment, researchers usually hope to see a difference in the responses so large that it is unlikely to happen just because of chance variation. We can use the laws of probability, which describe chance behavior, to learn whether the treatment eff
Statistically Significant association
In general, association does not imply causation, but association in data from a well-designed experiment (experiment with groups in randomized comparative experiment) does imply causation.
Blocking
When a population consists of groups of individuals that are similar within but different between, a stratified random sample gives a better estimate than a random sample. This same logic applies to experiments.
Block
Group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to treatments. When formed wisely, it is easier to find convincing evidence that one treatment is more effective than the ot
Randomized block design
The random assignment of experimental units to treatments is carried out separately within each block; it averages out the effects of other remaining variables and allows the unbiased comparison of the treatments
Helpful reminder
Control what you can, block what you can't, randomize to create comparable groups
Matched pairs design
Common type of randomized block design for comparing two treatments; idea is to create blocks by matching pairs of similar experimental units. Chance is to determine which unit in each pair gets each treatment. Sometimes a pair in a matched-pairs design c
Large amount of variability
We are unable to draw conclusions based off experiments if the response variable shows _____.
Design of study
Determines appropriate method of analysis
Scope of inference
Random selection allows inference about the population; random assignment allows inference about cause and effect relationships
Inference about cause and effect
Well-designed experiments randomly assign individuals to treatment groups. However, most experiments don't select experimental units at random from the larger population. That limits such experiments's inference
Inference about the population
Observational studies don't randomly assign individuals to groups, which rules out inference about cause and effect. Observational studies that use random sampling can make inferences about the population
Challenges of establishing causation
Well-designed experiment tell us that changes in the explanatory variable cause changes in the response variable. Lack of realism can limit our ability to apply the conclusions of an experiment to the settings of greatest interest (lab settings vs. realit
Causation from Observational Studies
It is sometimes possible to build strong case for causation based on data from observational studies; criteria include: the association is strong, consistent, larger values of the explanatory variable are associated with stronger response, alleged cause p
Data Ethics
Complex issue of data ethics arise when we collect data from people
Basic Data Ethics criteria
Basic standards of data ethics that must be obeyed by all studies that gather data from human subjects, both observational studies and experiments:
1. All planned studies must be reviewed in advance by an institutional review board charged with protecting
Sampling frame
List of all individuals from which a ample will be drawn
Undercoverage
Occurs when some members of the population cannot be chosen in a sample
Nonresponse
Occurs when individuals chosen for the sample can't be contacted or refuse to participate; often exceeds 50%
Response bias
A systematic pattern of incorrect responses in a sample survey leads to _____, due to ethnicity, gender, age, race, or behaviors.
Wording
The wording of questions and order of the questions presented to individuals is the most important influence on the answers given to a sample survey. Confusing or leading questions can introduce a strong bias and greatly impacts the survey's outcome.