Data
collections of observations
Statistics
the science of planning studies and experiments; obtaining data; and then organizing, summarizing, presenting, analyzing, and interpreting those data and then drawing conclusions based on them
population
complete collection of all measurements or data
census
collection of data from every member of the population
sample
a sub-collection of members selected from a population
voluntary response sample (self-selected sample)
the respondents themselves decide whether to be included
parameter
numerical measurement describing some characteristic of a population
statistic
numerical measurement describing some characteristic of a sample
Quantitative (numerical) data
consist of numbers representing counts or measurements
Qualitative (categorical) data
consist of names or labels that are not numbers representing counts or measurements
Quantitative data can be further described by?
discrete and continuous
Discrete data
result when the data values are quantitative and the number of values is "countable
Continuous data
result from infinitely many possible quantitative values, where the collection of values in not countable
Nominal level of measurement
categories only. data cannot be arranged in order
Ordinal level of measurement
data can be arranged in order but differences either can be found or are meaningless
Interval level of measurement
differences are meaningful, but there is no natural zero starting point and ratios are meaningless
Ratio level of measurement
there is a natural zero starting point and ratios are meaningful
example of ratio
heights, lengths, distances, volumes
example of interval
body temperatures in degrees
example of ordinal
ranks of colleges in US News & World Report
example of nominal
eye colors
Observational study
we observe and measure specific characteristics but we don't attempt to modify the subjects being studied
Experiment
we apply some treatment and then proceed to observe its effects on the subjects
Simple random sample
n subjects is selected in such a way that every possible sample of the same size n has the same change of being chosen
Systematic sampling
we select some starting point and then select every Kth (such as every 50th) element in the population
Convenience sampling
we simply use results that are very easy to get
Stratified sampling
we subdivide the population into a least two different subgroups so that subjects within the same subgroup share the same characteristics. Then we draw a sample from each subgroup
Cluster sampling
divide the population area into sections. Then we randomly select some of those clusters and choose all the members from those selected clusters
Cross-sectional study
data are observed, measured, and collected at one point in time, not over a period of time
retrospective (case-control) study
data collected from a past time period by going back in time
prospective study
data collected in the future from groups that share common factors
randomization
used when subjects are assigned to different groups through a process of random selection
replication
the repetition of an experiment on more than one subject
blinding
subject does not know whether he or she is receiving a treatment of placebo
placebo effect
occurs when an untreated subjects an improvement in symptoms
double-blind
blinding occurs at 2 levels
sampling error
occurs when the sample has been selected with a random method but there is a discrepancy between a sample result and the true population
non-sampling error
result of human error
nonrandom sampling error
result of using a sampling method that is not random
lower class limits
smallest numbers that can belong to the different classes
upper class limits
largest numbers that can belong to the different classes
class boundaries
numbers used to separate the classes, but without the gaps created by class limits
class midpoints
the values in the middle of the classes
class width
difference between two consecutive lower class limits in a frequency distribution
frequency distribution
helpful in organizing and summarizing data
histogram
a better tool than a frequency distribution
data skewed
if it is not symmetric and extends more to one side than to the other
skewed right (positively skewed)
have a longer right tail
skewed left (negatively skewed)
have a longer left tail
normal distribution
normal if the pattern of the points in the moral quantile plot is reasonably close to a straight line
not a normal distribution
if the normal quantile plot is not in a straight line
measure of center
a value at the center or middle of a data set
mean
average
median
50/50
mode
most repeated value