Statistics | Statistics

Population

Any collection of individuals (units) that are subject to investigation

Sample

A sub-set of the population that represents (in theory) the population

Variable

Characteristics of a population that different between individuals

Observation

A single measurement that forms part of a sample (e.g. the gender of a single survey respondent)

Random sampling

The characteristics of the study site are approximately homogenous, there is an equal chance of selection

Systematic sampling

Even/regular temporal or spatial intervals (e.g. sample along an environmental gradient) - does this capture variability?

Stratified sampling

Significant heterogeneity (variation) exists, even weighting must be given to each subset

Accuracy

Difference between sample estimates and true population value

Precision

Ability of a measurement to be constantly reproduced

Bias

Systematic variation from the population parameter of interest

Discontinuous data

Integer data or counts (e.g. blood type)

Continuous data

Values at any point along an uninterrupted scale (e.g. length, weight)

Nominal data

Discontinuous, categories that are mutually exclusive, these categories can be coded but the numbers used have no numerical meaning

Ordinal data

Discontinuous, refers to quantities that exhibit a natural ranking (e.g. order a race is finished in), can also included labelled scales (e.g. strongly disagree to strongly agree) and the intervals between values don't have to be equal (e.g. between race

Interval data

Continuous data where the intervals between values are equally split (e.g. ruler), mathematical power to add/subtract but not multiply/divide, no absolute zero so we can't say something is 'x' times bigger than something else (e.g. dates or temperature in

Ratio data

Continuous data with a natural zero point, all arithmetic is possible (e.g. temperature in Kelvin)

Gaussian/Normal distribution

A distribution where the mean, median and mode are closely aligned

Bimodal distribution

A distribution with two peaks

Standard deviation

Characterisation of the spread (width of distribution) around the central value (mean), where a higher value indicates greater spread

Variance

The spread of data with respect to the mean, where this is zero all values of x would be identical

Skewness

A measure of symmetry in distribution, characterising its shape, this affects the way we analyse a distribution

Kurtosis

A measure of peakedness - whether the peak is tall or shorter but more spread out

Moments of distribution

The different characteristics that make up a distribution

First moment

Central value (mean), the highest point in the distribution

Second moment

A measure of spread (variability) around the central value (standard deviation/variation)

Third moment

Skewness

Fourth moment

Kurtosis

Parametric data

Data is that is interval or ratio, normally distributed and when there are more than 30 observations

Non-parametric data

Data that is nominal or ordinal OR isn't normally distributed OR has less than 30 observations

Subjective probability

Probability based on the judgement of an individual

Theoretical probability

Assumes that all possible outcomes are equally likely, know sampling space, logic reasoning and controlled environment

Experimental probability

Probability based on observations that have been collected

Sample space

All the possible outcomes of an experiment

Event

Any subset of a sample space

Probability if all outcomes are equally likely

Number of outcomes corresponding to an event/total number of outcomes

Mutually exclusive event

Zero possibility of two events occurring together

Independent events

Events that have no influence on each other, one outcome won't affect another

Rescaling

When a value from a histogram is turned into a probability (y axis value/number of observations

Null hypothesis

When there is no statistically significant difference between a sample and population/between more than one sample/between more than one variable

Kolmogorov-Smirnov test (K-S test)

The test for normality

Alternative hypothesis

There is a statistically significant difference between the sample distribution and the normal distribution (there is a difference between our data and the normal distribution)

One sample t test

Parametric test for the difference between a sample and a population

Two sample t test

Parametric test for two different samples

ANOVA - Analysis of Variance

Parametric test comparing the means of more than two samples

Product Moment Correlation Coefficient

Looks at the scatter of data - the closer the scatter gives a higher r value

One way chi squared

Non-Parametric test for differences in nominal or ordinal data between a sample and a population - the difference between what we observe and what we expect

Two way chi squared test

Non-parametric test for the different between two or more samples, used for nominal or ordinal data, non-parametric equivalent of a t test

Mann-Whitney test

Non-parametric test comparing means of equal or unequal sample sizes, can be used for ordinal, interval or ratio data and is the equivalent of a t test

Kruskal-Wallis test

Non-parametric test which ranks the observations of 3 or more sample means, equivalent of ANOVA

Spearman's rank correlation

Good for looking at lichert (every time, never etc.) scale responses, data for both variables are ranked

Simple linear regression

Line of best fit

Total deviation

Total distance between observed data point and mean value of y

Explained deviation

Difference between mean value of y and predicted value of y

Unexplained deviation/Residual

Difference between observed value of y and predicted value of y, the unexplained part is the area between the data point and the regression line

F ratio

explained variance/unexplained variance, want it to be above 1 as this means there is more explained than unexplained variation

Coefficient of explanation

Proportion of variance that is explained by the model

Homoscedasticity

The range of values that stays the same as you move along the values of x

Heteroscedasticity

The different amounts of scatter that you get as you move along the values of x

Durbin-Watson d-statistic

Test for autocorrelation

Non-linear regression

When the changes in x are not matched by uniform changes in y