Psych 100A Final Exam Review

Right-skewed distribution

The right side of the graph has a long tail

Left-skewed distribution

The left side of the graph has a long tail

What best displays categorical variables?

Frequency tables ("tally()") or bar graphs ("gf_bar()")

What best displays quantitative variables?

Histograms or relative frequency histograms ("gf_histogram()" or "gf_dhistogram()"), boxplots ("gf_boxplot()"), and scatterplots ("gf_point()")

Type I error (alpha)

False positive results
i.e: Reject the null hypothesis when you should accept it

Why is the mean a good model for most distributions?

It balances the deviations above and below the mean

If the mean of a variable is 22.5, what would the empty model predict for each observation in that variable

Each observation would be 22.5 according to the empty model

If the mean of a variable is 22.5 and a given observation is 26.7, what is the data?

26.7

If the mean of a variable is 22.5 and a given observation is 20.1, what is the residual?

-2.4

Out of these three histograms:
1. of the variable
2. of the empty model
3. of the residuals
Which two would have a similar shape?

1. of the variable
3. of the residuals

In the GLM notation, what represents the model (or prediction) of the sample?

b0

Knowing that mean and SS have a direct relationship, what happens when you pick any number, higher or lower than the mean?

The SS is now bigger than its original value.

What is the difference between a residual and standard deviation?

Residuals are defined as how far an individual's score is from the mean
Standard deviations are defined as how generally far apart points are spread across a regression line relative to the mean.

What is the difference between SSerror and SStotal on a SUPERNOVA table?

They are sums of squared residuals from different models; SSerror comes from the quantitative predictor model, and SStotal comes from the simple/NULL/empty model.

In a histogram and a density histogram, what parts are the same?

The range of the x-axis,
the shape of the dist, and what points are most likely

In magnitude~depth and magnitude~longitude, why are the SStotals the same? (we are referring to earthquakes)

Because they are both from the simple model of magnitude. In other words, they use the same outcome variable.

Which of the following F-ratios indicate that the explanatory variable isn't explaining more variance per degree of freedom than the simple model?

Any value less than 1

What characteristic is the same across the simple, categorical predictor, and quantitative predictor models?

If we sum the residuals, it would equal zero for each model.

What can we say about the power of aggregation?

The law of large numbers, more sample size means more close to the population mean.

What is the key factor of a quantitative predictor model?

When you explain a quantitative variable with another quantitative variable (magnitude explained by longitude for earthquakes)

What is the key factor of a categorical predictor model?

When you explain a quantitative variable with a categorical variable (weight explained by type of food eaten)

Why is it important to examine both PRE and F-ratio (in SUPERNOVA)

PRE gives information about the variance accounted for and F-ratio corrects for model complexity

Is variance impacted by sample size?

NO, so we can compare error across two samples of different sizes

Is variance a sample statistic or a population parameter?

Sample statistic

Variance

The average of the squared deviations from the mean

Standard deviation

How much scores deviate around the mean

Larger z-scores mean

Larger residuals

What is similar about residuals from the empty model and the complex model?

Both represent the difference between the data and the model's prediction

Margin of error refers to variability of a/an

Estimate/Statistic

SS and SAD are both

measures of total error

Sampling distributions tell us that each time we take a random sample from the population...

There will be variability in the sample statistics

What is a correct interpretation of a CI for carbon and steel bikes (106.16/110.52)

Our data would be considered likely (likely = greater than 5%) for population mean commute times for carbon bikes between 106.16 and 110.52.

With smaller sample sizes, should we use confint.default() or confint()?

confint(), it's just safer under smaller sample size conditions.

Does confint.default() or confint() produce a slightly later CI?

confint()

What is true about the sampling distribution of PREs?

They aren't normally distributed
They shouldn't be modeled with a t-dist
They don't center around the sample PRE

Why don't the sampling distribution of PREs cluster around the sample mean?

Because the PREs were generated from randomization, which is a DGP that makes any differences between groups due to random chance.

What does a F value of 3.30 mean?

That 3.30 times the error is reduced by the additional parameter compared to any other parameter that could have been added to the model

If sample PRE and sample f are higher then...

p value should be lower

If sample PRE and sample f values are lower, then...

p value should be higher, and that means our sample explains a lot of variation.