Statistics, Chapter 1

Statistics

Deals with... collection, classification, interpretation of data in order to draw conclusions

Descriptive statistics

Describes a data set (sample)
STATISTIC --SAMPLE

Inferential statistics

Conclusions a larger set (population) -- obtained from a part (sample) of the population
PARAMETER -- POPULATION

Experimental unit

Unit, individual

Population (n)

A set (or totality or collection) of all units of interest [ex. totality of all registered voters]

Totality

Proportion or percentage

Population size (N)

Number of units in a population [ex. 120 million]

Variable

Characteristics or properties of the an individual population unit [ex. opinion of a voter]

Measurement

Process to assign numbers (labels) to variables of interest

Parameter

Summary measure computed to describe a characteristic of a population [as opposed to opinion of each voter]

Census

List of all units in a population [population is small -- possible to measure every unit in population]

Representative sample

Exhibits properties typical of those possessed by the target population

Random sample

Every sample in the population has the same chance of selection

Simple random sampling

Every possible subset of size n of the population has the same chance of selection -- N--> n

Stratified sample

Obtained by separating the population into homogeneous, non-overlapping groups (strata) and then obtaining a simple random sample from each stratum

Systematic sample

Obtained by selecting every kth individual from population

Cluster sample

Obtained by selecting all individuals within a randomly selected groups of individuals

Convenience sample

One in which the individuals in the sample are easily obtained

Statistic (estimate)

Summary measure that is computed to describe a characteristic from only a sample of the population

Stastistical Inference

Generalization about a population based on information contained in a sample -- using info contained in the smaller sample to conclusions about the larger population

Measure of reliability

Statement about the degree of uncertainty associated with a stasticial inference

Data

List of measurements (observations) a variable
Ex. observations - M or F; variable - gender

Classification of variables and data

Single variable - univariate data set
Two variables - bivariate data set
More than two variables - multivariate

Quantitative (numerical) data

Measurements that can be recorded on a numerical scale; arithmetic operations provide meaningful results
Ex. Age; GPA; Salary

Qualitative (categorical) data

Measurements that cannot be recorded on numerical scale [instead - categories]; arithmetic operations provide meaningful results

Nominal data

Qualitative
Data that consist of names, labels, or categories - UNORDERED
Ex. gender (M or F)

Ordinal data

Qualitative
Data that consist of names, labels, categories - ORDERED
Ex. health status (poor, good, very good, etc.)

Discrete variable

Quantitative
Countable (whole - [0,6]) number of possible values
Ex. spilled marbles; number of emails received by one student

Continuous variable

Quantitative
Infinite; not countable (all - [0,6]) and can take on all values in a certain interval
Ex. spilled coffee; AGE; HEIGHT; WEIGHT; SPEED OF A CAR

Interval level of measurement

Nominal (categories cannot be ordered) + ordinal (categories can be arranged in some order) + differences in values of the variable are meaningful (addition + substitution) and zero does not mean absence of the quantity

Ratio level of measurement

Nominal (categories cannot be ordered) + ordinal (categories can be arranged in some order) + interval (differences are meaningful; add/sub) + ratios in values are meaningful (mul + div) and zero means the absence of quantity

Summary levels of measurements

Nominal - categories only
Ordinal - categories with some order
Interval - differences are meaningful, but no natural starting point
Ratio - differences and ratios are meaningful and there is a natural starting point

Observational study

Observe units in natural settings without trying to influence the outcome of the study

Designed experiment

Strict control over the experiment, the units and the values of variables in the experiment

Lurking variable

Additional variable that influences the two variables being studied

Explanatory variable

Ex. frequency/level of cellphone usage

Response variable

Ex. whether or not brain cancer was contracted

Sample without replacement

Individual selected is then removed from population and cannot be chosen again

Sample with replacement

Selected individual is placed back into population