Population
Any collection of individuals (units) that are subject to investigation
Sample
A sub-set of the population that represents (in theory) the population
Variable
Characteristics of a population that different between individuals
Observation
A single measurement that forms part of a sample (e.g. the gender of a single survey respondent)
Random sampling
The characteristics of the study site are approximately homogenous, there is an equal chance of selection
Systematic sampling
Even/regular temporal or spatial intervals (e.g. sample along an environmental gradient) - does this capture variability?
Stratified sampling
Significant heterogeneity (variation) exists, even weighting must be given to each subset
Accuracy
Difference between sample estimates and true population value
Precision
Ability of a measurement to be constantly reproduced
Bias
Systematic variation from the population parameter of interest
Discontinuous data
Integer data or counts (e.g. blood type)
Continuous data
Values at any point along an uninterrupted scale (e.g. length, weight)
Nominal data
Discontinuous, categories that are mutually exclusive, these categories can be coded but the numbers used have no numerical meaning
Ordinal data
Discontinuous, refers to quantities that exhibit a natural ranking (e.g. order a race is finished in), can also included labelled scales (e.g. strongly disagree to strongly agree) and the intervals between values don't have to be equal (e.g. between race
Interval data
Continuous data where the intervals between values are equally split (e.g. ruler), mathematical power to add/subtract but not multiply/divide, no absolute zero so we can't say something is 'x' times bigger than something else (e.g. dates or temperature in
Ratio data
Continuous data with a natural zero point, all arithmetic is possible (e.g. temperature in Kelvin)
Gaussian/Normal distribution
A distribution where the mean, median and mode are closely aligned
Bimodal distribution
A distribution with two peaks
Standard deviation
Characterisation of the spread (width of distribution) around the central value (mean), where a higher value indicates greater spread
Variance
The spread of data with respect to the mean, where this is zero all values of x would be identical
Skewness
A measure of symmetry in distribution, characterising its shape, this affects the way we analyse a distribution
Kurtosis
A measure of peakedness - whether the peak is tall or shorter but more spread out
Moments of distribution
The different characteristics that make up a distribution
First moment
Central value (mean), the highest point in the distribution
Second moment
A measure of spread (variability) around the central value (standard deviation/variation)
Third moment
Skewness
Fourth moment
Kurtosis
Parametric data
Data is that is interval or ratio, normally distributed and when there are more than 30 observations
Non-parametric data
Data that is nominal or ordinal OR isn't normally distributed OR has less than 30 observations
Subjective probability
Probability based on the judgement of an individual
Theoretical probability
Assumes that all possible outcomes are equally likely, know sampling space, logic reasoning and controlled environment
Experimental probability
Probability based on observations that have been collected
Sample space
All the possible outcomes of an experiment
Event
Any subset of a sample space
Probability if all outcomes are equally likely
Number of outcomes corresponding to an event/total number of outcomes
Mutually exclusive event
Zero possibility of two events occurring together
Independent events
Events that have no influence on each other, one outcome won't affect another
Rescaling
When a value from a histogram is turned into a probability (y axis value/number of observations
Null hypothesis
When there is no statistically significant difference between a sample and population/between more than one sample/between more than one variable
Kolmogorov-Smirnov test (K-S test)
The test for normality
Alternative hypothesis
There is a statistically significant difference between the sample distribution and the normal distribution (there is a difference between our data and the normal distribution)
One sample t test
Parametric test for the difference between a sample and a population
Two sample t test
Parametric test for two different samples
ANOVA - Analysis of Variance
Parametric test comparing the means of more than two samples
Product Moment Correlation Coefficient
Looks at the scatter of data - the closer the scatter gives a higher r value
One way chi squared
Non-Parametric test for differences in nominal or ordinal data between a sample and a population - the difference between what we observe and what we expect
Two way chi squared test
Non-parametric test for the different between two or more samples, used for nominal or ordinal data, non-parametric equivalent of a t test
Mann-Whitney test
Non-parametric test comparing means of equal or unequal sample sizes, can be used for ordinal, interval or ratio data and is the equivalent of a t test
Kruskal-Wallis test
Non-parametric test which ranks the observations of 3 or more sample means, equivalent of ANOVA
Spearman's rank correlation
Good for looking at lichert (every time, never etc.) scale responses, data for both variables are ranked
Simple linear regression
Line of best fit
Total deviation
Total distance between observed data point and mean value of y
Explained deviation
Difference between mean value of y and predicted value of y
Unexplained deviation/Residual
Difference between observed value of y and predicted value of y, the unexplained part is the area between the data point and the regression line
F ratio
explained variance/unexplained variance, want it to be above 1 as this means there is more explained than unexplained variation
Coefficient of explanation
Proportion of variance that is explained by the model
Homoscedasticity
The range of values that stays the same as you move along the values of x
Heteroscedasticity
The different amounts of scatter that you get as you move along the values of x
Durbin-Watson d-statistic
Test for autocorrelation
Non-linear regression
When the changes in x are not matched by uniform changes in y