Psych Statistics | Statistics

data

All empirical research is based on observation and measurement, resulting in numbers, or scores, which are our data.

data point

a dot plotted on a graph to represent a pair of X and Y scores.

percent

A proportion multiplied by 100.

proportion

A decimal between 0 and 1 that indicates a fraction of the total. To transform a score to a proportion, divid the score by the total.

statistical notation

The standardized code for symbolizing the mathematical operations performed in the formulas and the answers obtained.

transformation

a procedure for systematically converting one set of scores into a different set of scores. Transformations make scores easier to work with and make different kinds of scores comparable.

Why do researchers need to learn statistics?

To conduct research and to understand the research of others.

If given no other information, what is the order in which to perform mathematical operations?

PERMDAS!

A researcher measures the IQ scores of a group of college students. What four things will the researcher use statistics for?

Organize, Summarize, Communicate, and Conclude

population

the group of all individuals to which research applies

sample

the subset of the population that is actually measured

variable

anything that, when measured, can produce two or more different scores

quantitative

measuring a quantity or amount

qualitative

measuring a quality or category

relationship

as the scores on one variable change, the scores on the other variable tend to change in a consistent fashion

strength

the extent to which one or close to one value of Y tends to be associated with only one value of X

perfectly strong relationship

one value of Y is associated with on X

weaker relationship

one batch of Y scores is associated with one X, and a different batch of Y scores is associated with a different X.

given"/independent variable

the X variable, dictates the outcome of the dependent variable

descriptive stats

use to organize, summarize, and describe sample data, and to predict and individual's Y score using the relationship with X

inferential stats

for deciding whether the sample data actually represent the relationship that occurs in the population

statistic

a number that describes a characteristic of a sample of scores, symbolized using a letter from the English alphabet.

paramater

a number that describes a characteristic of a population of scores, symbolized using a letter from the Greek alphabet

design

the way in which the study is laid out

experiment

we manipulate the independent variable and then measure participants' scores on the dependent variable.

condition, treatment, level

a specific amount or category of the independent variable

correlational study

neither variable is actively manipulated. SCores on both variables are simply measured and then the relationship is examined

nominal

numbers name or identify a quality or characteristic; bar graph

ordinal

numbers indicate rank order; bar graph

interval

numbers measure a specific amount, but with no true zero; histograph/polygon

ratio scale

numbers measure a specific amount and 0 indicates truly zero amount; histograph/polygon

continuous variable

measure in fractional amounts

discrete variable

measured only in whole numbers

dichotomous variable

discrete variable that has only two amounts or categories

How are samples used to make conclusions about the population? What are researchers really referring to when they talk about the population?

We assume that the relationship found in a sample reflects the relationship found in the population.All relevant individuals in the world, in nature.

What are the two aspects of a study to consider when deciding on the particular descriptive or inferential statistics that you should employ?

The design of the study and the scale of measurement used.

What is the general purpose of all research, whether experiments or correlational studies?

To discover relationships between variables which may reflect how nature operates.

N

the number of scores in the data

f

Simple frequency distribution: shows the frequency (on Y) of each score (on X)

normal distribution forming a normal curve

extreme high and low scores are relatively infrequent, scores closer to the middle score are more frequent, and the middle score occurs most frequently. The low-frequency, extreme low and extreme high scores are in the tails of a normal distribution.

negatively skewed

low-frequency, extreme low scores, but not low-frequency, extreme high scores.

positively skewed

low-frequency, extreme high scores, but not low-frequency, extreme low scores.

bimodal distribution

two areas showing relatively high-frequency scores.

rectangular distribution

all scores have the same frequency

rel. f

relative frequency is the proportion of time that the score occurred

relative frequency distribution

same as simple frequency distribution except that the Y axis is labeled in increments between 0 and 1.

proportion of total area under the normal curve

occupied by particular scores equals the combined relative frequency of those scores

cf

cumulative frequency of a score: the frequency of all scores at or below the score

percentile

percent of all scores at or below a given score, on the normal curve the percentile of s core is the percent of the area under the curve to the left of the score.

ungrouped distribution

the f, rel. f, cf, or percentile of each individual score is reported

grouped distribution

different scores are grouped together, and the f, rel. f, cf, or percentile for each group is reported.

What is the difference between graphing a relationship as we did in Chapter 2 and graphing a frequency distribution?9

The graph showed the relationship where, as scores on the X variable change, scores on the Y variable change. A frequency distribution shows the relationship where, as X scores change, their frequency (shown on Y) changes.

What is the advantage of computing relative frequency instead of simple frequency? What is the advantage of computing percentile instead of cumulative frequency? 5

Relative frequency may be easier to interpret than simple frequency. Percentile may be easier to interpret than cumulative frequency.

What is the difference between scores in the left-hand tail and scores in the right-hand tail?11

Scores in the left-hand tail are the lowest scores in the distribution; scores in the right-hand tail are the highest.

measures of central tendency

summarize the location of a distribution on a variable, indicating where the center of the distribution tends to be.

mode

most frequently occurring score or scores in a distribution; used primarily to summarize nominal data

median

symbolized by Mdn, is the score at the 50th percentile. It is used primarily with ordinal data and with skewed interval or ratio data.

mean

the average score located at the mathematical center of a distribution. It is used with interval or ratio data that form a symmetrical distribution, such as the normal distribution. The symbol for a sample mean is X with a line over it, and the symbol for

X-X(with a line over it)

the amount a score deviates from the mean

sum of the deviations around the mean

equals zero; makes the mean the best score to use when predicting any individual score, because the total error across all such predictions will equal zero.

What two pieces of information about the location of a raw score does a deviation score convey?9

Deviations convey whether a score is above or below the mean and how far teh score is from the mean.

measures of variability

indicate how much the scores differ from each other and how much the distribution is spread out

range

the difference between the highest and the lowest score

variance

used when the mean to describe a normal distribution of interval or ratio scores. It is the average of the squared deviations of scores around the mean. Variance equals the squared standard deviation.

standard deviation

used with the mean to describe a normal distribution of interval/ratio scores. It can be thought of as somewhat like the "average" amount that scores deviate from the mean. For any roughly normal distribution, the standard deviation will equal about one-s

normal distribution

34% of the scores are between the mean and the score that is a distance of one standard deviation from the mean. Therefore, approximately 68% of the distribution lies between the two scores that are plus and minus one standard deviation from the mean.

standard deviations

when they're relatively small, the scores in the conditions are similar, and so a more consistent relationship is present (stronger)

Why is describing the variability important?1

It is needed for a complete description of the data, indicating how spread out scores are and how accurately the mean summarizes them.

When is the range used as the sole measure of variability?3

With nominal or ordinal scores or with interval/ratio scores that cannot be accurately described by other measures.

relative standing

of a score reflects a systematic evaluation of teh score relative to a sample or population

z-score

indicates a score's relative standing by indicating the distance the score is from the mean when measured in standard deviations

positive z-score

indicates that the raw score is above the mean

negative z-score

indicates that the raw score is below the mean

the larger the absolute value of z

the farther the raw score is from the mean, so the less frequently the z-score and raw score occur

z-distribution

produced by transforming all raw scores in a distribution into z-scores

standard normal curve

a perfect normal z-distribution that is our model of the z-distribution that results from data that are approximately normally distributed, interval or ratio scores.

sampling distribution of means

the frequency distribution of all possible sample means that occur when an infinite number of samples of teh same size N are randomly selected from one raw score population.

central limit theorem

shows that in a sampling distribution of means the distribution will be approx. normal, the mean of the sampling distribution will equal the mean of the underlying raw score population used to create the sampling distribution, and the variability of the s

standard error of the mean

the standard deviation of the sampling distribution of means.

scatterplot

a graph that shows the location of each pair of X-Y scores in the data.

outlier

a data point that lies outside of the general pattern in the scatterplot

regression line

summarizes a relationship by passing through the center of the scatterplot

linear relationship

as the X scores increase, the Y scores tend to change in only one direction

circular/elliptical scatterplots

produce horizontal regression lines indicate no relationship

correlation coefficient

describes the type of relationship (the direction Y scores change) and the strength of the relationship.

Pearson correlation coefficient

used to describe the type and the strength of the linear relationship between two interval and/or ratio variables.

Spearman rank-order correlation coefficient

used to describe the type and strength of the linear relationship between two ordinal variables

restriction of range problem

occurs when the range scores from one or both variables is limited. Then the correlation coefficient underestimates the true strength of the relationship that would be found if the range were not restricted.

linear regression

the procedure for predicting unknown Y scores based on correlated X scores

linear regression line

the best-fitting straight line that summarizes a linear relationship

slope

indicates the rate and direction the Y scores change

Y intercept

which is the value of Y when the line crosses the Y axis and the starting value from which the Y scores change

Y1 (Y prime)

can be predicted from each X

predictor variable

X variable

criterion variable

Y variable

standard error of the estimate

interpreted as the "average error" when using Y prime to predict Y scores

homoscedastic

the spread in the Y scores around all Y prime scores are the same and the Y scores at each X are normally distributed around their corresponding value of Y prime.

proportion of variance accounted for

the proportional improvement in accuracy that is achieved by using the relationship to predict Y scores, compared to using Y with a line over it to predict scores.

coefficient of determination

computed by squaring the correlation coefficient

coefficient of alienation

the proportion of variance not accounted for is the proportion of the prediction error that is not eliminated when y prime is the predicted score instead of the Y with a line over it.

multiple correlation and multiple regression

procedures for describing the relationship when multiple predictor variables are simultaneously used to predict scores on one criterion variable.

probability

indicates the likelihood of an event when random chance is operating

random sampling

selecting a sample so that all elements or individuals in the population have an equal chance of being selected.

independent events

the probability of one event is not influenced by the occurrence of the other

dependent events

the probability of one event is influenced by the occurrence of the other.

sampling with replacement

replacing individuals or events back into the population before selecting again

sampling without replacement

not replacing individuals or events back into the population before selecting again.

representative sample

the individuals and scores in the sample accurately reflect the types of individuals and scores found in the population.

sampling error

results when chance produces a sample statistic that is different from the population parameter that it represents.

region of rejection

the extreme tail or tails of a sampling distribution. Sample means here are unlikely to represent the underlying raw score population.

criterion probability

the probability (.05) that defines samples as unlikely to represent the underlying raw score population.

critical value

the minimum z-score needed for a sample mean to lie in the region of rejection.

parametric inferential procedures

require assumptions about the raw score populations being represented. They are performed when we compute the mean.

nonparametric

do not require stringent assumptions about the populations being represented. They are performed when we compute the median or mode.

alternative hypothesis

the statistical hypothesis that describes the population mues being represented if the predicted relationship exists.

null hypothesis

the statistical hypothesis that describes the population mues being represented if the predicted relationship does not exist.

two-tailed test

used when we do not predict the direction in which the dependent scores will change.

one-tailed test

used when the direction of the relationship is predicted

alpha

the symbol for our criterion probability, which defines the size of teh region of rejection and the probability of a Type I erro.

z-test

the parametric procedure used in a one-sample experiment if the population contains normally distributed interval or ratio scores and the standard deviation of the population is known.

z-obt beyond z-crit

then the corresponding sample mean is unlikely to occur when sampling from the population described by the null. Therefore, we reject the null and accept the alternative. This is a significant result and is evidence of the predicted relationship in the po

z-obt within z-crit

then the corresponding sample mean is likely to occur when sampling the population described by the null. Therefore, we retain the null. This is a nonsignificant result and is not evidence for or against the predicted relationship.

type 1 error

occurs when a true null is rejected. Its theoretical probability equals alpha. The theoretical probability of avoiding a Type 1 error when retaining the null is 1-alpha.

type 2 error

occurs when a false null is retained. Its theoretical probability is beta. The theoretical probability of avoiding a Type 2 error when rejecting the null is 1-beta.

power

the probability of rejecting a false null, and it equals 1-beta.

null is true; we reject the null due to our data

type 1 error

null is true (the relationship does not exist in nature); we retain the null cause of our data

we avoid type 1 error

null is false (the relationship exists in nature); we retain null

type 2 error

null is false (the relationship exists in nature); we reject the null

avoid type 2 error

significant

p<.05

nonsignificant

p>.05

one-sample t-test

for testing a one sample experiment when the standard deviation of the raw score population is not known

t-distribution

a sampling distribution of all possible values of t when a raw score population is infinitely sampled using a particular N. A t-distribution that more or less forms a perfect normal curve will occur depending on the degrees of freedom of the samples used

point estimation

a mue is assumed to be at a point on the variable equal to average of X. Because the sample probably contains sampling error, a point estimate is likely to be incorrect.

interval estimation

a mue is assumed to lie within a specified interval. Interval estimation is performed by computing a confidence interval.

confidence interval for a single mue

describes a range of mues, one of which the sample mean is likely to represent. The interval contains the highest and lowest values of mue that are not significantly different form the sample mean.

p

Pearson correlation coefficient in the population

ps

Spearman correlation coefficient in the population

sampling distribution of the Pearson (r)

shows all possible values of r that occur when samples are drawn from a population in which p is zero.

sampling distribution of the Spearman (rs)

shows all possible values of rs that occur when samples are drawn from a population in which ps is zero.

maximize the power of experiments

-creating large differences in scores between the conditions of the independent variable
-minimizing the variability of the scores within each condition
-increasing the N of small samples

maximize the power of a correlation coefficient

-avoiding a restricted range
-minimizing the variability in Y at each X
-increasing the N of small samples

independent samples

participants are randomly selected for each, without regard to who else has been selected, and each participant is in only one condition

independent-samples t-test

two independent samples, normally distributed interval or ratio scores, and homogeneous variance

homogeneity of variance

that the variances in the populations being represented are equal

confidence interval for the difference between two mues

contains a range of differences between two mues, any one of which is likely to be represented by the difference between our two sample means.

related samples

we match each participant in one condition to a participant in the other condition, or when we use repeated measures of one group of participants tested under both conditions

confidence interval for mueD

contains a range of values of mue D, any one of which is likely to be represented by our sample's D with a line over it.

power of a two-sample t-test

larger differences in scores between the conditions, smaller variability of scores within each condition, and larger ns. the related-samples t-test is more powerful than the independent-samples t-test.

effect size

indicates the amount of influence that changing the conditions of the independent variable had on the dependent scores.

Cohen's d

measures effect size as the magnitude of the difference between the conditions.

proportion of variance accounted for

measures effect size as the consistency of scores produced within each condition. The larger the proportion, the more accurately the mean of a condition predicts individual scores in that condition.

ANOVA factor

independent variable

ANOVA level or treatment

condition

ANOVA sum of squares (SS)

sum of squared deviations

ANOVA mean square

variance

ANOVA treatment effect

effect of independent variable

one-way analysis of variance

tests for significant differences between the means from two or more levels of a factor.

between-subjects factor

the conditions involve independent samples.

within-subjects factor

the conditions involve related samples

experiment-wise error rate

the probability that a Type 1 error will occur in an experiment. ANOVA keeps the experiment-wise error rate equal to alpha.

Anova tests...

null hypothesis that all mues being represented are equal; alternative hypothesis that not all mues are equal

mean square within groups

measures the differences among the scores within teh conditions

mean square between groups

measures the differences among the level means

F-obt

computed using the F-ratio, which equals the mean square between groups divided by the mean square within groups

If F-obt is significant w/ more than two levels

perform post hoc comparisons to determine which means differ significantly.

If all ns are not equal,

perform Fisher's protected t-test on each pair of means

if all ns are equal

perform Tukey's HSD test

eta squared (n^2)

describes the effect size - the proportion of variance in dependent scores accounted for by the levels of the independent variable.

two-way, between-subjects ANOVA

two independent variables, and all conditions of both factors contain independent samples.

two-way, within-subjects ANOVA

performed when both factors involve related samples

two-way, mixed design ANOVA

performed when one factor has independent samples and one factor has related samples

complete factorial design

all levels of one factor are combined with all levels of the other factor

cell

each cell is formed by a particular combination of a level from each factor

two-way ANOVA

we compute an F-obt for the main effect of A, for the main effect of B, and for the interaction of A X B

main effect means

for a factor are obtained by collapsing across (combining the scores from) the levels of the other factor. Collapsing across factor B produces the main effect means for factor A. Collapsing across factor A produces the maine effect means for factor B.

significant main effect

indicates differences between the main effect means, indicating a relationship is produced when we manipulate one independent variable by itself.

significant two-way interaction effect

indicates that the cell means differ such that the relationship between one factor and the dependent scores depends on the level of the other factor that is present. When graphed, an interaction effect produces nonparallel lines.

post hoc comparisons

perform on each significant effect having more than two levels to determine which specific means differ significantly.

unconfounded

the means from two cells are unconfounded if the cells differ along only one factor.

confounded

the cells differ along more than one factor.

chi square

used with one or more nominal variables, and the data are the frequencies with which participants fall into each category

one-way chi square

compares teh frequency of category membership along one variable

two-way chi square tests

whetehr category membership for one variable is independent of category membership for the other variable.

phi coefficient

2 X 2 design with a significant two-way chi square

contingency coefficient (C)

if the design is not a 2 X 2.

Mann-Whitney U test

nonparametric, independent-samples t-test

Wilcoxon T test

nonparametric, related-samples t-test for ranks

Kruskal-Wallis H test

nonparametric, one-way, between-subjects ANOVA for ranks

Friedman X^2 test

nonparametric, one-way, within-subjects ANOVA for ranks