Statistics
a set of tools and techniques that is used for describing, organizing and interpreting information or data.
Goals of science? How does statistics help?
3 basic goals: description, prediction and explanation. Helps understand the world!
Descriptive Statistics
used to organize and describe data.
Inferential Statistics
used to make inferences from a small group to a larger group.
Sample
the group you actually ask or get data from. This is a smaller group (subset) of the larger group you are interested in.
Population
the group you are actually interested in.
Variable
something that can change or has different values for different individuals. Represent concepts we are interested in.
Data
information we collect from the sample on the variables we're interested in.
Continuous Data
data measured on a continuum. Height, weight, age, self-esteem. All numbers between two endpoints are possible scores.
Categorical Data
data that sorts people into categories, only so many options. example: gender or major.
Central Tendency
3 ways to measure: mean, median and mode! Single number that best represents an entire group of scores!
Mean
Average! VERY sensitive to outliers! USED THE MOST!
Median
Midpoint!
Mode
FREQUENCY! Used with categorical data.
Bimodal
more than one mode!
When to use MODE?
when the data are categorical.
Which method is MOST influenced by extreme scores?
THE MEAN!
Variability
tells us how different the scores are from each other, as a spread or dispersion.
Why is variability important?
helps us understand the nature of our sample and the nature of our variables.
Measures of variability
range, standard deviation and variance.
Range
most general, how far apart the scores are from one another; determined by subtracting the highest number from the lowest.
Standard Deviation
most reported, average amount of variability in a set of scores, how far away are the data points are from the mean. Low SD-close to the mean. High SD- far away from the mean.
Variance
DON'T TAKE THE SQUARE ROOT OF SD!
Identifying Extreme Values
At least 3 standard deviations away!
SD to understand a data point
understand the "nature" of your data.
Histograms
see the distribution of our data. The height of each bar is the number of time each value occurs in our data set.
Skewness of a Histogram
lack of symmetry.
Kurtosis of a Histogram
refers to how peaked vs. flat the distribution is. Low-relatively flat, more variability. High-relatively peaked, less variability.
Bar graphs vs. Histograms
Bar graphs- compare the frequency of categorical responses!
Histograms- compare the distribution of continuous variables.
Misleading graphs
BEWARE THE SCALE OF THE AXES!
Correlations
how does the changes in the value of one variable influence the values of another variable!
When do you use correlations?
when every individual has 2 scores on 2 continuous variables.
Scatterplots
plots one variable on the x axis, one on the y axis. Useful for looking at relationships between 2 variables.
Strength
how close the relationship is to perfect. -1 (bad relationship), 0 (no relationship) and 1 (perfect relationship).
Direction
+ or direct relationships move in the same direction.
- or indirect relationships move in opposite directions. Only shows the direction of the slope.
Limitations of correlation coefficient
can only identify LINEAR relationships!
Reporting correlations
example: There was a strong positive correlation between age and happiness (r=.61), suggesting that as age increases, so does happiness.
Correlation Matrix
simple way to report a number of correlations at one time.
Correlation vs. Causation
CORRELATION DOES NOT EQUAL CAUSATION!
Measures
the act or process of assigning numbers to phenomena according to a rule. (pounds, inches and GPA).
Independent Variable vs. Dependent Variables
IV- conditions of a experiment ("I" manipulate)
DV- the outcome you are investigating (D=data)
Observed score
the score you actually got.
Error Score
discrepancy between observed and true. What could have caused an error?
Reliability
consistent
Test-Retest
determining if the same person will receive the same score if administered at two separate occasions.
Parallel Forms
several forms of measure should be equivalent.
Internal Consistency/Cronbach's Alpha
similarities between items that are used to determine the same things, measured from 0-1 so 1 is better.
Inter-rater reliability
consistency of observations made by two people.
Validity
accuracy
Content Validity
good sample of universal items that could be asked about this topic; should acquire experts.
Criterion Validity
does it reflect right now, or in the future.
Concurrent
right now
Predictive
future
Construct Validity
measures what it should and doesn't measure what it should not.
Convergent
relates to things that it should.
Discriminant
should NOT relate to things that it shouldn't.
What if we wanted to know the average height of A&M students? Variable? Data? Population? and Example of a sample?
Variable: height
Data: the values in inches
Population: all A&M students
Sample: this class
Why do we square (x-xbar)? SD question!
to get rid of the negative numbers!
Why n-1? SD question!
by making denominator smaller, our standard deviation will be bigger. Penalty for small samples!
Why take the square root? SD question!
to get back to the original unit!
Central Tendency of a Histogram
three distributions differing in central tendency but not anything else.
Variability of a Histogram
three distributions with the same central tendency but different amounts of variability!
Reliable but not valid
Around an area not at the bulls eye is reliable but not valid.
When to use the MEAN?
when the data are continuous and you don't have any extreme scores.
When to use the MEDIAN?
when the data are continuous and you think the mean is misleading because of extreme scores!
Low SD means?
The data point is close to the mean!
High SD means?
The data point is far away from the mean!
Tail to the left?
Negatively skewed!
Tail to the right?
Positively skewed!
Low Kurtosis
flat, more variability!
High Kurtosis
peaked, less variability!
Reverse causation
shy children caused dominant mothers.
Reciprocal causation
shy children cause dominant mothers and dominant mothers cause shy children.
Common-casual variables
a dominant father could become a unknown third variable.
Nominal
nameable!" (home state, they have names!)
Ordinal
ordering" (favorite class taken at A&M, can be ordered)
Interval
equal intervals" (temperature, are in segments)
Ratio
zerO=ratiO!" (grades, possible to get a 0)
True score
the true reflection of what you know.
Valid but not reliable
All over the dart board is valid but not reliable.
Valid and Reliable
Close to the bulls eye is valid and reliable!