drawing inferences
testing hypothesis
good research design
ensures that the data we examine will allow us to draw inferences and answer our research question
purpose of research design
to ensure that the inferences we draw are valid
how to achieve valid inferences
1. account for threats to valid inference
2. identify an appropriate unit of observation
3. identify an appropriate temporal-spatial domain
comparison
cornerstone of research design
- compare values of Y across space
- compare values of Y across time
THE DESIGN IDENTIFIES THE __________ WE WILL MAKE
causal process
must be the same across units
*** make sure there are not other variables that could be causing y
two units
we must compare ___________ or more _________ to asses if we may infer that x causes y
maximizing comparability
the units should be identical except for the independent variable of interest
- random sampling
- statistical control
- matching
internal validity
does the change in our independent variable really cause the change we observe in our dependent variable
threats to internal validity
history
maturation
testing
regression to the mean
selection
history as a threat to validity
any historical or external event that occurred during the course of the study that may be responsible for the effects instead of the independent variable
maturation as a threat to validity
natural process that leads participants to change on the dependent measure, such as getting older so more concentrated on concentration exam
testing as a threat to validity
an improvement of scores due to taking a pretest
selection bias as a threat to validity
refers to any difference between the groups before the start of the study
regression to the mean as a threat to validity
moving back to the mean when they were at an extreme score, just because they would have regardless of whether or not they had the independent variable
how to isolate a causal effect (increase internal validity)
- experimental designs seek to ensure that ONLY X changes
- manipulate X while holding all other possible relevant variables constant
- GOAL: rule out all rival explanations for change in Y, except the change in X
standard experimental design
- randomly divide subjects into two groups (( treatment and control))
- do not present control group with stimulus ((get a different value of x))
- measure Y in each group afterwards
- any difference in Y across the two groups was caused by the treatment.
key elements of experimental design
- random selection (each case has the same chance of being in the experiment)
- random assignment (each case has the same chance of being assigned to control or treatment)
- used together, random selection and assignment ensures that the groups are equiva
randomization
makes the groups identical (on average) in all ways except for the treatment
- eliminates the threat of a spurious relationship
- controls for observables (things we could measure but have not)
- controls for unobservables (differences between units we ca
challenges of the experimental design
- the value of many independent variables cannot be randomly assigned *war, gender, religion
- differences between lab and actual world (does it generalize) ** external validity
- convenience samples ** the sample we can get may not always be ideal
field experiment
randomly assign individuals into groups, but perform the manipulation in the real world
natural experiment
an event outside the social scientists control separates people into "control" and "treatment" groups
non experimental designs
must cross the four hurdles
when we cannot conduct experiments
we collect data as they occur and study them, but the logic inference is precisely the same
- there are different challenges to this type of inference
observational study
- take "world as it is" and study naturally occurring differences between units
- cross sectional
- time series
cross sectional observational study
many units sampled over one time period
time-series
single unit sampled over many time periods
controlling variables in non experimental studies
- we measure the X's we want to hold "constant"
- statistics permit us to estimate the impact of a given X upon Y, as if other Xs had been held constant
- PROBLEM: we can only "control" for other Xs that we measure and include in our study
good research design summary
- theory drives designs
- good research design...
- helps establish validity of causal inferences
- consideres other factors that may be moving the dependent variable
- spuriousness
- controlling for other factors to allow comparison across like units
operationalization
must be able to measure theoretical concepts of interest (DV, IV, controls) in order to test for suspected cause and effect
- without good measurement, inference is suspect
steps for measuring social and political phenomena
- begin with good theoretical understanding of phenomen of interest
- construct good theoretical definition
- use that theoretical definition to develop the operational definition
- we want:
VALID MEASURES
RELIABLE MEASURES
UNBIASED MEASURES
conceptual clarity
- define the characteristics and boundaries of concept orr construct of interest
- know your unit of interest (individuals?states? countries?)
- know your variation of interest (over time? between units?
- be precise
validity
- extent to which your instrument measures the construct of interest
- is your measure of a construct related to other measures (of other variables of interest) as predicted by theory?
- face validity
- content validity
- construct validity
face validity
the validity at face value... if the test "looks" like it is going to measure what it is supposed to measure
content validity
shows how much your measure captures every component of the experiment
construct validity
the degree to which the measure is related to other measures that theory requires them to be related to
reliability
extent to which re-application of a measurement method produces identical values for a variable
- if you cannot generate same values for dependent variable or independent variable successively, your confidence in your result is diminished
test retest reliability
used to asses the consistency of a measure from one time to another... same test.. to different occasions
alternative form reliability
makes a second form of test that is not the same as the first measure, but is similar...
split-halves reliability
a test given and divided into halves and scored separately, then the score of one half of the test are compared to the score of the remaining half to test the reliability
inter-rater reliability
used to assess the degree which different raters/observers give consistent estimates of the same phenomenon
bias (systematic measurement error)
- measurement is reliable but is consistently "off the mark"
- consistently records values for your variable of interest
that are either too high or too low
- can still uncover associations between dependent
variable and independent variable
- but must be
discrete variables
cannot take on all the values between the variables ( such as if you were having people rank on a scale from 1 , 2, 3, 4, or 5)
continuous variable
can be any number between negative and positive infinity
level of measurement
the mathematical qualities of the values assigned
nominal
- cannot be ranked or "operated" on by any mathematical function
- categories must be mutually exclusive and collectively exhaustive
ordinal
- discrete
- observations are in categories that acn be ranked
- the distance between those ranks is undefined
- categories must be mutually exclusive and collectively exhaustive
interval-ratio
- might be discrete or continuos
- constant distance between values
- interval: arbitrary zero point (temperature)
- ratio: zero is meaningful (time)
- categories must be mutually exclusive and exhaustive
samples
any well defined set of units of analysis
- drawn from a theoretically constituted population
- parameters are estimates of population parameter (our real goal)
- the larger the sample, the smaller the sampling error (estimates of population parameters ar
describing nominal variables
- can be described based on their frequency
- the most suitable descriptive statistic is the mode
describing ordinal variables
- mode: the most frequent value
- median
- and IQR, which is 75th percentile - 25th percentile
describing interval-ratio level variables
- we can describe the "moments" of the variable
- moments describe the "central tendency" of a variable and the distribution of value around it
mean (1st moment)
the sum of all scores divided by the number of scores
zero sum property of the mean
the sum of the differences between each Y value and Ybar is equal to zero
least squares property
-----
mean is considered the "expected value" of the variables
because of the zero sum property of the mean and the least squares property
the expected value
- the mean is best guess
- essentially our first model
the effect of outliers
- median is more resistant to this
- mean will be pulled by these
- always look at your data
- look at the range and check out those outliers
mean
- does not perfectly represent all of the data points
deviation (dispersion)
the difference between an observed value and the mean
TELLS US HOW SPREAD OUT THE DATA IS
small deviations
the data is clustered around the mean
large deviations
the data is spread out
the sum of deviations
is zero
the mean of deviations
is zero
we need to get rid of negative signs in deviations
so we square them
sum of squared errors
the deviations squared and then added together
variance
the sum of squared errors divided by the number
- tells us typically how much a data point differs from the mean (this is the second moment)
standard deviation
the square root of the variance
calculations of variance uses different denominators
POPULATION: sample size n
SAMPLE: sample size n-1
degrees of freedom (why we divide the sample size by n-1)
- we are using the variance and standard deviation of the sample to estimate the true variance and standard (of population)
- in order to do so we are going to assume that the sample mean is the population mean
-ONE VALUE HAS TO BE OF A CERTAIN SIZE TO MA
key point of degrees of freedom
because we hold the population mean to be the sample mean, we must exclude one value from the calculation... so we divide by the sample size minus one
standard error of the mean
equals the standard deviation of the sample over/ the square root of the sample size
- we need to this so we dont have to pull hundreds of repeated samples
skew
- not symmetric
- most frequent scores more common at one end
positive and negative
normal distributions
- bell shaped and symmetric
- majority of scores lie around middle of distribution
positive skew
few scores at the upper end of the scale
negative skew
few scores at the lower end of the scale