Statistics

Arcsine Transform

When the data are proportions it is usually recommended that they be transformed with the arcsine transform. This takes the original data x and converts it to the transformed data y using this formula. (Jk08-28)

Average Deviation

The average deviation, or the mean absolute deviation, measures the absolute difference between the mean and each observation. This measure of deviation is not as well defined as is the standard deviation, partly because the mean is the least squares esti

Bimodal Frequency Distribution

A bimodal distribution is like a combination of two normal distributions -there are two peaks. If you find that your data fall in a bimodal distribution you might consider whether the data actually represent two separate populations of measurements.
(Jk08

Central Tendency

In statistics, a central tendency (or more commonly, a measure of central tendency) is a central value of a typical value for a probability distribution. It is occasionally called an average or just the center of the distribution.
(Jk08-22)

Data Reduction

Summarize trends, capture the common aspects of a set of observations such as the average, standard deviation, and correlation among variables.
(Jk08-304)
In data reduction, we can describe the whole frequency distribution with just two numbers -the mean

Degree of Freedom

In trying to measure variance we have to keep in mind that our estimate of the central tendency x-barra is probably wrong to a certain extent. We take this into account by giving up a "degree of freedom" in the sample formula. Degree of freedom is a measu

x-bar

Dispersion

We usually want to also know how closely clustered the data are around the central point or most typical value in the data. That is, how dispersed are the data values away from the center of the distribution? The minimum possible amount of dispersion is t

Frequency Distribution

A frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency of count of the occurrences of values within a particular group or interval, and in this way, the table su

Histogram

A histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson.
(Wikipedia)

Inference

Generalize from a representative set of observations to a large universe of possible observations using hypothesis tests such as the t-test or analysis of variance.
(Jk08-210)
The normal distribution provides a basis for drawing inferences about the accur

Interval

This is a property that is measured on a scale that does not have a true zero value. In an interval scale, the magnitud of differences of adjacent observations can be determined (unlike the adjacent items on an ordinal scale), but because the zero value o

J-Shaped Frequency Distribution

The J-shaped distribution is a kind of skewed distribution with most observations coming from the very end of the measurement scale. For example, if you count speech errors per utterance you might find that most utterances have a speech error count of cer

Least squares estimates of central tendency

This means that if we take the difference between the mean and each value in our data set, square these differences and add them up, we will have a smaller value than if we were to do the same thing with the median or any other estimate of the "mid-point

Mean

The mean value, or the arithmetic average, is the least squares estimate of central tendency. First, how to calculate the mean -sum the data values and then divide by the number of values in the data set.
(Jk08:660)

Measures of Central Tendency

1. Mode
2. Median (center of gravity)
3. Mean (arithmetic average)

Mode

The mode of the distribution is the most frequently occurring value in the distribution --the tip of the frequency distribution.
(Jk08:645)

Nominal

Named properties --they have no meaningful order on a scale of any type.
(Jk08:323)
Examples: What language is being observed? What dialect.

Normal Distribution

The "normal distribution" is an especially useful theoretical function... If this is a good description of the source of variability in our measurements, then we can model this situation by assuming that the underlying property is at the center of the fre

Normal Distribution (2)

In the normal --bell shaped-- distribution, measurements tend to congregate around a typical value and values become less and less likely as they deviate further from this central value.
(Jk08-461)

Normal Distribution (3)

The normal curve is defined by two parameters --what the central tendency is (M) and how quickly probability goes down as you move away from the center of the distribution (s).
(Jk08-475)

Observations - Descriptive Properties

Each observation will have several descriptive properties --some will be qualitative and some will be quantitative-- and descriptive properties (variables) come in one of four types: (i) Nominal, (ii) Ordinal, (iii) Interval, and (iv) Ratio.
(Jk08-323)

Ordinal

Orderable propierties --they aren't observed on a measurable scale, but this kind of property is transitive so that if <a> is less than <b> and <b> is less than <c> then <a> is also less than <c>. (e.g. excellent, good, fair, poor)
(Jk08:323)

Probabilty

One of the main goals of quantitative analysis is the exploration of processes that may have a basis in probability: theoretical modeling, say in information theory, or in practical contexts such as a probabilistic sentence parsing.
(Jk08:515)
We can quan

Probability Density Function

As probability theory is used in quite diverse applications, terminology is not uniform and sometimes confusing.
Probability Density Function, p.d.f., most often reserved for continuous random variables.
(Wikipedia)
The probability density function (p.d.f

Probability Plot

The probability plot is a graphical technique for assessing whether or not a date set follows a given distribution such as the normal or Weibull. The data are plotted against a theoretical distribution in such a way that the points should form approximate

Advantages of q-q plot

The advantages of the q-q plot are: (1) The sample sizes do not need to be equal. (2) Many distributional aspects can be simultaneously tested. For example, shifts in location, shifts in scale, changes in symmetry, and the presence of outliers can all be

Quantile

By a quantile, we mean the fraction (or percent) of points below the given value. That is the 0.3 (or 30%) quantile is the point at which 30% of the data fall below and 70% fall above that value.
(Jk08-568)

Quantile-quantile (q-q) plot

The q-q plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. A 45-degree reference line i

Quantitative Analysis

The four main goals of quantitative analysis are: (1) data reduction, (2) inference, (3) discovery of relationships, and (4) exploration of processes that may have a basis in probability.
(Jk08-304)

Range

A simple, but not very useful measure of dispersion is the range of the data values. This is the difference between the maximum and minimum values in the data set.
(Jk08-78)

Ratio

This is a property that we measure on a scale that does have an absolute zero value. This is called a ratio scale because ratios of these measurements are meaningful. Examples: acoustic measures --frequency, duration, frequency counts, reaction time.
(Jk0

Relationships Discovery

Find descriptive or casual patterns in data which may be described in multiple regression models or in factor analysis.
(Jk08-515)

Root Mean Square (RMS)

The variance is the average squared deviation --the units are squared-- to get back to the original unit of measure we take the square root of the variance. This is the same as the value known as the RMS (root mean square), a measure of deviation used in

Sum of The Squared Deviations

Skewed Frequency Distribution

If measurements are taken on a scale, as we approach one end of the scale the frequency distribution is bound to be skewed because there is a limit beyond which the data values cannot go. We most often run into skewed frequency distributions when dealing

Standardizing a data set

We can relate the frequency distribution of our data to the normal distribution because we know the mean and standard deviation of both. The key is to be able to express any value in a data set in terms of its distance in standard deviations from the mean

Transformation

One standard method that is used to make a data set fall on a more normal distribution is to transform the data from the original measurement scale and put it on a scale that is stretched or compressed in helpful ways.
(Jk08-630)

Types of Distribution

Data come in a variety of shapes of frequency distributions: (a) uniform, (b) skewed, (c) bimodal, (d) normal, (e) J-shaped, (f) U-shaped.

U-shaped Frequency Distribution

A very polarized distribution of results. If you ask a number of people how strongly they supported the US invasion of Iraq most people world be either strongly in favor or strongly opposed with not too many in the middle.
(Jk08-491)

Uniform Frequency Distribution

If every outcome is equaly likely then the distribution is uniform. This happens for example with the six sideds of a dice -each one is (supposed to be) equally likely, so if you count up the number of rolls that come up "1" it should be on average 1 out

Variance

Variance is like the mean absolute deviation except that we square the deviations before averaging them.
The variance is the average squared deviation --the units are squared--.
(Jk08-734)

Population Variance

Sample Variance

Weighted Mean

Suppose you asked someone to rate the grammaticality of a set of sentences, but you also left the person rate their ratings, to say that they feel very sure or not very sure at all about the rating given. These confidence values can be used as weights (Wi

Z-scores

The data values are converted into z-scores when each data value is replaced by the distance between it and the sample mean where the distance is measured as the number of standard deviations between the data value and the mean. Z-scores always have a mea