AP Statistics Chapter 4: Designing Studies

Population

In a statistical study, the entire group of individuals we want information about (population of interest; not to be confused with population of inference)

Census

Collects data from every individual in the population

Sample

Subset of individuals in the population from which we actually collect data; collecting data from a representative sample allows us to make an inference about the population

Steps to Choosing a sample

We often draw conclusions about a whole population on the basis of a sample; in choosing a sample from a large, varied population, we must:
Step 1: Define the population we want to describe
Step 2: Say exactly what we want to measure (give exact definitio

Sample Survey

A study that uses an organized plan to choose a sample that represents some specific population

Convenience Sample

Choosing individuals from the population who are easy to reach; often produce unrepresentative data, almost guaranteed to show bias

Bias

The design of a statistical study shows this factor if it would consistently underestimate or consistently overestimate the value you want to know (over or underrepresentation of a group)

Voluntary response sample

Self-selected sample; consists of people who choose themselves by responding to a general invitation (ex: email); show bias because people with strong opinions or who feel strongly about an issue, often in the same direction, are most likely to respond. P

Random sampling

Involves using a change process to determine which members of a population are included in the sample; a sample chosen by chance rules out both favoritism by the sampler and self-selection by respondants.

Simple Random Sample (SRS)

of size n, is chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample. It gives every possible sample of the desired size an equal chance to be chosen.

Table of Random Digits

In practice, people use random numbers generated by a computer or calculator to choose samples; if technology is not available, this resource can be used

Choosing and SRS with technology

Step 1: Label. Give each individual in the population a distinct numerical label from 1-N.
Step 2: Randomize: Use a random number generator to obtain n different integers from 1-N.

Choosing an SRS with Table D

Step 1: Label. Give each member of the population a numerical label with the same number of digits. Use as few digits as possible.
Step 2: Randomize. Read consecutive groups of digits of the appropriate length from left to right across a line in Table D.

Stratified random sample

Sometimes there are statistical advantages to using more complex sampling methods. To get a ____, start by classifying the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to f

Strata

Classifications of the population into groups of similar individuals. When we choose strata that are similar within but different between, stratified random samples give more precise estimates than simple random samples of the same size.

Cluster sample

A method that selects groups of individuals that are "near;" start by classifying the population into groups of individuals that are near each other. Used for practical reasons of saving money and time (higher efficiency is the greatest benefit to this sa

Clusters

Best chosen when the different within but similar between; when samples are just like the population but on a smaller scale. More varied than stratum.

Inference

The purpose of a sample is to give us information about a larger population, the process of drawing conclusions about a population on the basis of sample data. Larger random samples give better information about the population than random samples.

Samples that do not allow inference

Convenience and voluntary response samples do not allow us to infer about the population because the sample is misleading and contains bias; therefore, it does not fairly represent the population.

Reliance on random sampling

Avoids bias in selecting samples from the list of available individuals; the laws of probability allow trustworthy inference about the population.

Margin of error

Results from random samples come with _____ that sets bounds on the size of the likely error.

4.1 Tips

Describe, fully explain, and justify full steps in processes and answers.
Things to write in a short answer asking for a sample from a table of random digits: population, sample, value you want to measure (with units), how to choose the SRS, "From table D

Unbiased samples

Will still produce estimates that differ from the value we want to know simply by chance. However, these estimates will be too small about half the time and too large the other half of the time.

Without replacement

Must explicitly state that the repeated integers should be ignored, or say that they will generate random integers until n different numbers are selected from the given range.

Best variables to chose for stratification

Those that would most accurately predict the response.

Observational study

Observes individuals and measures variables of interest but does not attempt to influence the response; therefore, we cannot determine cause and effect (this includes sample surveys). The purpose is to describe a group or situation, compare groups, and ex

Experiment

deliberately and actively imposes some treatment on individuals to measure their response. When our goal is to understand cause and effect, this is the only source of fully convincing data. Directly answers question. The purpose is to determine if a treat

Confounding

Occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. Reason why observational studies of the effect of an explanatory variable often fail; occurs between the explanatory

Response variable

Measures the outcome of a study or experiment

Explanatory variable

Helps explain or predict changes in a response variable

Experiment

A statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response

Experimental units

The smallest collection/entity of individuals to which treatment is applied (can be objects, plants, animals, humans)

Subjects

Name for experimental units when they are human beings

Treatment

A specific condition applied to the individuals in an experiment; if an experiment has several explanatory variables, a treatment is a combination of specific values of these variables.

Experiment conditions

Experiments often use a design: Experimental units -> treatment -> measure response. In the lab environment, simple designs often work well. Field experiments and experiments with animals or people deal with more variable conditions. Outside the lab, badl

Factors

Explanatory variables; each treatment is formed by combining a specific value (level) of each ___.

Experiment

Allows study of combined effects of several factors and interactions of several factors can produce effects that can not be predicted by looking at each of the factors alone

Comparative experiment

Remedy for confounding, experiment in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments

Random assignment

If treatments are given to groups that differ greatly, like self-placed groups, bias will result. We use ____ so that units are assigned to treatments using a chance process; ensures that the effects of other variables are spread evenly among the two grou

Chance

Assigns individuals to groups but will always cause some difference between the groups

Control

Prevents confounding, reduces variability in the response variable, and provides a baseline for comparison

4 Principles of experimental design

1. Comparison. Use a design that compares two or more treatments.
2. Random assignment. Use chance to assign experimental units to treatments; doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variable

Completely Randomized Design

The treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group. Using chance to assign treatments in an experiment does NOT guarantee a completely randomized design.

Control group

A group that receives an inactive treatment or an existing baseline treatment. It is okay if there is no control group when researching the comparison of the effects of several treatments, as opposed to trying to determine if any one treatment works bette

Good experiments

The logic of a randomized comparative experiment depends on our ability to treat all the subjects the same in every way accept for the actual treatments being compared; require careful attention to details to ensure that all subjects really are treated id

Placebo effect

A response to a dummy treatment; very strong, expectations bias results

Double-blind experiment

Neither the subjects nor those who interact with them and measure the response variable know which treatment a subject is administered

Single-blind

When individuals interacting with the subjects know the treatment the subjects are receiving; however, the subjects are still unaware of their treatment and/or the measured response variables

Confounding variables

Confounding occurs when existing differences in the experimental units are not taken into account; different variables might systematically affect the response to treatments

Statistically Significant

In an experiment, researchers usually hope to see a difference in the responses so large that it is unlikely to happen just because of chance variation. We can use the laws of probability, which describe chance behavior, to learn whether the treatment eff

Statistically Significant association

In general, association does not imply causation, but association in data from a well-designed experiment (experiment with groups in randomized comparative experiment) does imply causation.

Blocking

When a population consists of groups of individuals that are similar within but different between, a stratified random sample gives a better estimate than a random sample. This same logic applies to experiments.

Block

Group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to treatments. When formed wisely, it is easier to find convincing evidence that one treatment is more effective than the ot

Randomized block design

The random assignment of experimental units to treatments is carried out separately within each block; it averages out the effects of other remaining variables and allows the unbiased comparison of the treatments

Helpful reminder

Control what you can, block what you can't, randomize to create comparable groups

Matched pairs design

Common type of randomized block design for comparing two treatments; idea is to create blocks by matching pairs of similar experimental units. Chance is to determine which unit in each pair gets each treatment. Sometimes a pair in a matched-pairs design c

Large amount of variability

We are unable to draw conclusions based off experiments if the response variable shows _____.

Design of study

Determines appropriate method of analysis

Scope of inference

Random selection allows inference about the population; random assignment allows inference about cause and effect relationships

Inference about cause and effect

Well-designed experiments randomly assign individuals to treatment groups. However, most experiments don't select experimental units at random from the larger population. That limits such experiments's inference

Inference about the population

Observational studies don't randomly assign individuals to groups, which rules out inference about cause and effect. Observational studies that use random sampling can make inferences about the population

Challenges of establishing causation

Well-designed experiment tell us that changes in the explanatory variable cause changes in the response variable. Lack of realism can limit our ability to apply the conclusions of an experiment to the settings of greatest interest (lab settings vs. realit

Causation from Observational Studies

It is sometimes possible to build strong case for causation based on data from observational studies; criteria include: the association is strong, consistent, larger values of the explanatory variable are associated with stronger response, alleged cause p

Data Ethics

Complex issue of data ethics arise when we collect data from people

Basic Data Ethics criteria

Basic standards of data ethics that must be obeyed by all studies that gather data from human subjects, both observational studies and experiments:
1. All planned studies must be reviewed in advance by an institutional review board charged with protecting

Sampling frame

List of all individuals from which a ample will be drawn

Undercoverage

Occurs when some members of the population cannot be chosen in a sample

Nonresponse

Occurs when individuals chosen for the sample can't be contacted or refuse to participate; often exceeds 50%

Response bias

A systematic pattern of incorrect responses in a sample survey leads to _____, due to ethnicity, gender, age, race, or behaviors.

Wording

The wording of questions and order of the questions presented to individuals is the most important influence on the answers given to a sample survey. Confusing or leading questions can introduce a strong bias and greatly impacts the survey's outcome.