Statistics Unit 3 | Statistics

random

an outcome is random if we know the possible values it can have, but which particular value it takes

generating random numbers

random numbers are hard to generate. nevertheless, several internet sites offer an unlimited supply of equally likely random values

simulation

models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variable of interest

simulation component

uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely

trial

the sequence of several components representing events that we are pretending will take place

response variable

values of the response variable record the results of each trial with respect to what we were interested in

population

the entire group of individuals or instances about whom we hope to learn

sample

a representative subset of a population, examined in hope of learning about the population

sample survey

a study that asks questions of a sample drawn from some population in the hope of learning something about the entire population. polls taken to assess voter preferences are common sample surveys

bias

any systematic failure of a sampling method to represent its population is bias. biased sampling methods tend to over- or underestimate parameters. its almost impossible to recover from bias, so efforts to avoid it are well spent

randomization

the best defense against bias is randomization, in which each individual is given a fair, random chance of selection

sample size

the number of individuals in a sample. the sample size determines how well the sample represents the population, not the fraction of the population sampled

census

a sample that consists of the entire population is called a census

population parameter

a numerically valued attribute of a model for a population. we rarely know the true value of a population parameter, but we do hope to estimate it from sampled data

statistic, sample statistic

statistics are values calculated for sampled data. those that correspond to, and thus estimate, a population parameter, are of particular interest. the term "sample statistic" is sometimes used, usually to parallel the corresponding term "population param

representative

the statistics computed from it are accurately reflect the corresponding population parameters

simple random sample (SRS)

a simple random sample of sample size n is a sample in which each set of n elements in the population has an equal chance of selection

sampling frame

a list of individuals from whom the sample is drawn is called the sampling frame. individuals who may be in the population of interest, but who are not in the sampling frame, cannot be included in any sample

sampling variability

the natural tendency of randomly drawn samples to differ, one from another. sometimes, unfortunately, called sampling error, sampling variability is no error at all, but just the natural result of random sampling

stratified random samapling

a sampling design in which the population is divided into several sub-populations, and random samples are then drawn from each sub-population. if the sub-population are homogeneous, but are different from each other, a stratified sample may yield more con

cluster sample

a sampling design in which entire groups, or clusters, are chosen at random. cluster sampling is usually selected as a matter of convenience, practicality, or cost. each cluster should be representative of the population, so all the clusters should be het

multistage sample

sampling schemes that combine several sampling methods are called multistage samples

systematic sample

a sample is drawn by selecting individuals systematically from a sampling frame. where there is no relationship between the order of the sampling frame and the variables of interest, a systematic sample can be representative

pilot

a small trial run of a survey to check whether questions are clear. a pilot study can reduce errors due to ambiguous questions

voluntary response bias

bias introduced to a sample when individuals can choose on their own whether to participate in the sample. samples based on the voluntary response are always invalid and cannot be recovered, no matter how large the sample size

convenience sample

consists of the individuals who are conveniently available. convenience samples often fair to be representative because every individual in the population is not equally convenient to sample

undercoverage

a sampling scheme that biases the sample in a way that gives a part of the population less representation than it has in the population suffers from undercoverage

non-response bias

bias introduced when a large fraction of those sampled fails to respond. those who do respond are likely not to represent the entire population. voluntary response bias is a form of non-response bias, but non-response may occur for other reasons.

response bias

anything in a survey design that influences responses falls under the heading of response bias. one typical response bias arises from the wording of questions, which may suggest a favored response.

observational study

a study based on data in which no manipulation of factors has been employed

retrospective study

an observational study in which subjects are selected and then their previous conditions or behaviors are determined. retrospective studies need not be based on random samples and they usually focus on estimating differences between groups or associations

prospective study

an observational study in which subjects are followed to observe future outcomes. because no treatments are deliberately applied, a prospective study is not an experiment. nevertheless, prospective studies typically focus on estimating differences among g

experiment

manipulates factor levels to create treatments, randomly assigns subjects to these treatment levels. and them compares the responses of the subject groups across treatment levels

random assignment

to be valid, an experiment most assign experimental units to treatment groups at random

factor

a variable whose levels are manipulated by the experimenter. experiments attempt to discover the effects that differences in factor levels may have on the responses of the experimental units

response

a variable whose values are compared across different treatments. in a randomized experiment, large response differences can be attributed to the effect of differences in treatment level

experimental units

individuals on whom an experiment is preformed. usually called subjects or participants when they are human

level

the specific values that the experimenter chooses for a factor are called the levels of the factor

treatment

the process, intervention, or other controlled circumstance applied to randomly assigned experimental units. treatments are the different levels of a single factor or are made up of combinations of levels of two or more factors

principles of experimental design

control, randomize, replicate, and block

statistically significant

when an observed difference is too large for us to beleive that it is likely to have occurred naturally, we consider the difference to be statistically significant. subsequent chapters will show specific calculations and give rules, but the principle rema

control group

the experimental units assigned to a baseline treatment level, typically either the default treatment, which is well understood, or a null, placebo treatment. their responses provide a basis for comparison

blinding

any individual associated with an experiment who is not aware of how the subjects have been allocated to treatment groups are said to have been blinded

single-blind & double blind

there are two main classes of individuals who can affect the outcome of an experiment: those who could influence the results (the subjects, treatment administrators, or technicians), and those who evaluate the results (judges, treating physicians, etc.) w

placebo

a treatment known to have no effect, administered so that all groups experience the same conditions. many subjects respond to such a treatment (a response known as a placebo effect). only by comparing with a placebo can we be sire that the observed effect

placebo effect

the tendency of many human subjects (often 20% or more of experiment subjects) to show a response even when administered a placebo

blocking

when groups of experimental units are similar, it is often a good idea to gather them together into blocks. by blocking, we isolate the variability attributable to the differences between the blocks so that we can see the differences caused by the treatme

matching

in a retrospective or prospective study, subjects who are similar in ways not under study may be matched and then compared with each other on the variables of interest. matching, like blocking, reduces unwanted variation

designs

in a completely randomized design, all experimental units have an equal chance of receiving any treatment

confounding

when the levels of one factor are associated with the levels of another factor in such a way that their effects cannot be separated, we say that these two factors are confounded