Statistics

Variable

any characteristic of some event, object, or person that may vary

Dependent Variable (DV)

The variable that is measured to determine if the independent variable has any effect

Independent Variable (IV)

The variable said to be responsible for some sort of effect

Categorical Variable

a variable that categories which are different in quality, not in numerical magnitude

Quantitative Variable

a variable that has a theoretically infinite number of intermediate values, and differs in magnitude

Discrete

possible values are from a specific range (0, 1, 2...)

Continuous

possible values are from an infinite number of possibilities (8.654, 7.4563, ...)

Nominal Scale of Measurement

variable has discrete, mutually exclusive categories. No order. (ex; eye color)

Ordinal Scale of Measurement

variable has discrete categories that have specific order. Magnitude between categories is not necessarily equal. (ex; letter grades A, B, C, D)

Interval Scale of Measurement

variable values have a specific order and there is equal numerical magnitude between values.

Ratio Scale of Measurement

like interval (values have specific order and there is equal numerical magnitude between values), but also has the addition of a meaningful "zero point", where zero means the absence of some property.

Bar Graph

discrete scores, usually from nominal or ordinal scale

Pie Chart

slice of the pie represents the percentage of observations in a category

Histogram

small or large range of continuous scores from an interval or ratio scale

Polygon

large range (population or close to population) of continuous scores from an interval or ratio scale. Created by joining the top bars of a histogram

Operationalization

The process of specifying how something will be measured

True Experiment

IV (manipulated)
DV
Random assignment to groups
Can make causal inference

Quasi-Experiment

IV (might be manipulated)
DV
NO random assignment (pre-existing groups)
Some causal inference ok

Correlational Design

Variable X
Variable Y
Relation between two variables
Can't make causal inference

Population

larger than a sample, parameter

Sample

Subset of the population (smaller than population), statistic, good samples are representative of the larger population

Simple Random Sampling

A simple random sample of n subjects from a population is one in which each possible sample of that size has the same probability (chance) of being selected

Systematic Random Sampling

Ex; choose every 3rd person on a list of names

Stratified Random Sampling

Divide people into different groups or strata (gender, race, etc)
disproportional- sample is not representative of the amount in the population
proportional- sample is representative of the amount in the population

Cluster Sampling

Divide the population into clusters(city blocks)
randomly select blocks (clusters)
and sample everyone in those clusters

Selecting Appropriate Measure of Central Tendency

A. Interval / ratio ---> mean
B. Ordinal ---> median
C. Nominal ---> mode
(** can calculate measure of CT from tier below but not above)

Outlier

data points that are extremely greater than or less than the mean.

Skewed Left

longer left tail, most observations are medium/large (ex; age of death from natural causes).
Mean is most effected
mean < median < mode

Skewed Right

longer right tail, most observations are small/medium (ex; salary)
mode < median < mean

Range

Difference between that highest and lowest scores of a distribution.

Standard Deviation

The average by which a typical score deviates from the mean

Variance

Square of the Standard Deviation

Probability

The chances of an event occurring. With a random sample or or randomized experiment, the probability an observation has a particular outcome is the proportion of times that outcome would occur in a very long sequence of observations

Joint Probabilities

The probability of selecting from the sample space an element where two conditions are present at the same time P(A and B)
(venn diagram, that overlaps)

Disjoint Probabilities

Probability that events cannot occur at the same time (aka they are mutually exclusive)

Addition Rules

Disjoint Probabilities:
P(A or B)= P(A) + P(B)
Joint Probabilities:
P(A or B)= P(A) + P(B) - P (A and B)

Independent Probabilities

Probabilities are independent if the probability of one event occurring does not affect the probability the other event will occur

Multiplication Rule

Multiplication rule for independent events:
P(A and B)= P(A)*P(B)

Conditional Probability

What is the probability of A given B?
P (A I B)= P(A and B)/P(B)
** if events are independent P(A I B)= P(A) and you can use the general multiplication rule for independent events

Normal Distribution

68-95-99
Total area under curve=1 and mean=0
Symmetric
Asymptotic (extreme ends of distribution are approaching zero but never actually reach zero)
Unimodal
% of scores under curve is predictable

Why is Normal important?

Setting the total area to 1 allows us to use Z scores to calculate the relative location of an individual in a data set

Standardized Scores

to "standardize" is to transform scores that are on different metrics into a single metric so that they can be compared

Z-scores

Standardized scores in terms of standard deviation units
z= y-mean/s