data
observations(measurements, genders, survey reponses) that have been collected
statistics
a collection of methods for planning experiments, obtaining data and then organizing, summarizing, presenting, analyzing, interpreting and drawing conclusions based on the data
population
the complete collection of ALL elements to be studied
census
collection of data from EVERY element in a population
sample
a SUB COLLECTION of elements selected from a population
(Statistical thinking) When analyzing data collected the following factors should be considered:
Context, Source, Sampling Method, Conclusions and Practical implications
Bad sample examples:
Self-selected survey, GIGO, small samples
Self-selected survey/voluntary response sample
one in which the respondents themselves decide whether to be included
GIGO
garbage in, garbage out
(Statistical Significance) is achieved when
you get a result that is very unlikely to occur by chance (winning the lottery twice)
Parameter
a numerical measurement describing some characteristic of a POPULATION
Statistic
a numerical measurement describing some characteristic of a SAMPLE
Quantitative data
consist of NUMBERS representing counts or measurements
Discrete data
data which results from either a finite number of possible values or a countable number of possible values (1,2,3...)
Continuous data
data which results from infinitely many possible values that can be associated with points on a continuous scale in such a way that there are no gaps, interruptions, or jumps (could go on... measurements... weight, height, etc)
Qualitative (categorical or attribute) data:
nonnumeric data that can be separated into different CATEGORIES
Nominal
characterized by data that consists of names, labels, or categories only Data cannot be arranged in an ordering scheme or order is not meaningful (fave color, SS#, etc)
Ordinal
involves data that may be arranged in some order, but differences (subtraction) between data values either cannot be determined or are meaningless (t-shirts, low-med-high)
Interval
like the ordinal level, with the additional property that we can determine meaningful amounts of differences between data. However, data at this level do not have a natural zero starting point (subtraction but no ration, ex: temperature)
Ratio
the interval level modified to include the natural zero starting point where zero indicates that none of the quantity is present. Differences and ratios are both meaningful. Ratio test: if one number is twice the other, is the quantity being measured also
Observational study
we observe and measure specific characteristics without attempting to modify the subjects being studied
experiment
we apply some treatment and then observe its effects on the subjects
systematic sampling
select some starting point and then select ever kth element in the population ( pulling every person out of the phone book and calling)
convenience sampling
use results that are readily available or very easy to get (family and friends)
stratified sampling
subdivide the population into subgroups that share the same characteristics then draw a simple random sample from each subgroup (driving to different schools in Indiana)
cluster sampling
divide the population into sections, randomly select some of those clusters, and then choose all members from the selected clusters
replication
used when an experiment is repeated on a sample of subjects that is large enough so that we can see the true nature of any effects
confounding
occurs in an experiment when you are not able to distinguish among the effects of different factors
sampling error
the difference between a sample result and the true population result such an error results from chance sample fluctuation
non sampling error
sample data that is incorrectly collected, recorded or analyzed (collected biased sample)
nonrandom sampling error
the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample