Statistics 203 Exam 1

Statistics

a set of tools and techniques that is used for describing, organizing and interpreting information or data.

Goals of science? How does statistics help?

3 basic goals: description, prediction and explanation. Helps understand the world!

Descriptive Statistics

used to organize and describe data.

Inferential Statistics

used to make inferences from a small group to a larger group.

Sample

the group you actually ask or get data from. This is a smaller group (subset) of the larger group you are interested in.

Population

the group you are actually interested in.

Variable

something that can change or has different values for different individuals. Represent concepts we are interested in.

Data

information we collect from the sample on the variables we're interested in.

Continuous Data

data measured on a continuum. Height, weight, age, self-esteem. All numbers between two endpoints are possible scores.

Categorical Data

data that sorts people into categories, only so many options. example: gender or major.

Central Tendency

3 ways to measure: mean, median and mode! Single number that best represents an entire group of scores!

Mean

Average! VERY sensitive to outliers! USED THE MOST!

Median

Midpoint!

Mode

FREQUENCY! Used with categorical data.

Bimodal

more than one mode!

When to use MODE?

when the data are categorical.

Which method is MOST influenced by extreme scores?

THE MEAN!

Variability

tells us how different the scores are from each other, as a spread or dispersion.

Why is variability important?

helps us understand the nature of our sample and the nature of our variables.

Measures of variability

range, standard deviation and variance.

Range

most general, how far apart the scores are from one another; determined by subtracting the highest number from the lowest.

Standard Deviation

most reported, average amount of variability in a set of scores, how far away are the data points are from the mean. Low SD-close to the mean. High SD- far away from the mean.

Variance

DON'T TAKE THE SQUARE ROOT OF SD!

Identifying Extreme Values

At least 3 standard deviations away!

SD to understand a data point

understand the "nature" of your data.

Histograms

see the distribution of our data. The height of each bar is the number of time each value occurs in our data set.

Skewness of a Histogram

lack of symmetry.

Kurtosis of a Histogram

refers to how peaked vs. flat the distribution is. Low-relatively flat, more variability. High-relatively peaked, less variability.

Bar graphs vs. Histograms

Bar graphs- compare the frequency of categorical responses!
Histograms- compare the distribution of continuous variables.

Misleading graphs

BEWARE THE SCALE OF THE AXES!

Correlations

how does the changes in the value of one variable influence the values of another variable!

When do you use correlations?

when every individual has 2 scores on 2 continuous variables.

Scatterplots

plots one variable on the x axis, one on the y axis. Useful for looking at relationships between 2 variables.

Strength

how close the relationship is to perfect. -1 (bad relationship), 0 (no relationship) and 1 (perfect relationship).

Direction

+ or direct relationships move in the same direction.
- or indirect relationships move in opposite directions. Only shows the direction of the slope.

Limitations of correlation coefficient

can only identify LINEAR relationships!

Reporting correlations

example: There was a strong positive correlation between age and happiness (r=.61), suggesting that as age increases, so does happiness.

Correlation Matrix

simple way to report a number of correlations at one time.

Correlation vs. Causation

CORRELATION DOES NOT EQUAL CAUSATION!

Measures

the act or process of assigning numbers to phenomena according to a rule. (pounds, inches and GPA).

Independent Variable vs. Dependent Variables

IV- conditions of a experiment ("I" manipulate)
DV- the outcome you are investigating (D=data)

Observed score

the score you actually got.

Error Score

discrepancy between observed and true. What could have caused an error?

Reliability

consistent

Test-Retest

determining if the same person will receive the same score if administered at two separate occasions.

Parallel Forms

several forms of measure should be equivalent.

Internal Consistency/Cronbach's Alpha

similarities between items that are used to determine the same things, measured from 0-1 so 1 is better.

Inter-rater reliability

consistency of observations made by two people.

Validity

accuracy

Content Validity

good sample of universal items that could be asked about this topic; should acquire experts.

Criterion Validity

does it reflect right now, or in the future.

Concurrent

right now

Predictive

future

Construct Validity

measures what it should and doesn't measure what it should not.

Convergent

relates to things that it should.

Discriminant

should NOT relate to things that it shouldn't.

What if we wanted to know the average height of A&M students? Variable? Data? Population? and Example of a sample?

Variable: height
Data: the values in inches
Population: all A&M students
Sample: this class

Why do we square (x-xbar)? SD question!

to get rid of the negative numbers!

Why n-1? SD question!

by making denominator smaller, our standard deviation will be bigger. Penalty for small samples!

Why take the square root? SD question!

to get back to the original unit!

Central Tendency of a Histogram

three distributions differing in central tendency but not anything else.

Variability of a Histogram

three distributions with the same central tendency but different amounts of variability!

Reliable but not valid

Around an area not at the bulls eye is reliable but not valid.

When to use the MEAN?

when the data are continuous and you don't have any extreme scores.

When to use the MEDIAN?

when the data are continuous and you think the mean is misleading because of extreme scores!

Low SD means?

The data point is close to the mean!

High SD means?

The data point is far away from the mean!

Tail to the left?

Negatively skewed!

Tail to the right?

Positively skewed!

Low Kurtosis

flat, more variability!

High Kurtosis

peaked, less variability!

Reverse causation

shy children caused dominant mothers.

Reciprocal causation

shy children cause dominant mothers and dominant mothers cause shy children.

Common-casual variables

a dominant father could become a unknown third variable.

Nominal

nameable!" (home state, they have names!)

Ordinal

ordering" (favorite class taken at A&M, can be ordered)

Interval

equal intervals" (temperature, are in segments)

Ratio

zerO=ratiO!" (grades, possible to get a 0)

True score

the true reflection of what you know.

Valid but not reliable

All over the dart board is valid but not reliable.

Valid and Reliable

Close to the bulls eye is valid and reliable!