The Practice of Statistics - Chapter 1

Individual

An object described by a set of data.

Variable

Any characteristic of an individual, which can take different values for different individuals.

Categorical Variable

A characteristic of an individual that places the individual into one of several groups.

Quantitative Variable

A characteristic of an individual that takes numerical values for which arithmetic operations such as adding and averaging make sense.

Distribution

The pattern of variation of a variable that describes what values the variable takes and how often it takes these values.

Range

The difference between the largest and smallest observations, which serves as a nonresistant measure of spread.

Spread

the variability of the data

Outlier

An individual observation that falls outside of the overall pattern of the graph, which can be calculated by multiplying the interquartile range by 1.5 and adding it to the third quartile or subtracting it from the first quartile.

Center

mean, the balancing point of a distribution

Shape

After graphing the data this is was the graph looks like. example: left skewed, right skewed, symmetric ...

Skew

A distribution in which either the right or left side of the display contains the larger values and is more prominent than the left side.

Symmetry

A distribution in which the right and left sides of the display are approximately mirror images of each other.

Dot Plot

a graphical device that summarizes data by the number of dots above each data value on the horizontal axis

Histogram

Looks like a bar graph but the bars are touching and a bar represents a range of numbers.

Stemplot

Represents data by separating each value into two parts: the steam (left digit) and leaf (right digit).

Split Stemplot

an ordinary stem and leaf plot and "splits" the stem section into two entries. This way, instead of having many numbers in one leaf section, it is split into two separate categories.

Back-to-Back Stemplot

Used to compare the distribution of a quantitative variable for two groups. Each observation in both groups is separated into a stem, arranged in a vertical column with the smallest at the top.

Time Plot

A graphical display of data that is useful for showing the pattern of changes over time if there are not many variables, in which each observation of a variable is plotted against the time at which it was meastured, the time is shown on the horizontal axi

Mean

The arithmetic average of a distribution, denoted x (x-bar), which is found by adding all of the values of the observations and dividing the total sum by the total number of observations and serves as a nonresistant measure of spread.

Summation

The sum of all of the numbers in a set, denoted sigma.

Median

The midpoint of a distribution, denoted M, which is found by arranging all of the observations in order of size from smallest to largest and determining what number falls in the center of the arrangement and serves as a resistant measure of spread.

Resistant

A measure of data distribution that is not impacted by the influence of extreme observations. Such measures include the median, and exclude the mean, range.

Quartile

The denotation of the middle half that is resistant and can be found by calculating the median of the regions either left (first, denoted Q subscript 1) or right (third, denoted Q subscript 3) of the median.

Interquartile Range

The distance between the first and the third quartiles, denoted IQR, and found by subtracting the first quartile from the third.

Five-Number Summary

A description of data set that consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest.

Minimum

The lowest point on the graph

Maximum

The highest point on the graph

Boxplot

A graphical distribution of the five-number summary, which uses a central box to span the quartiles, a line in the box to mark the median, and lines extending from the box to connect the smallest and largest observations. It is useful for side-by-side com

Modified Boxplot

A graphical distribution of the five-number summary, which uses a central box to span the quartiles, a line in the box to mark the median, lines extending from the box to connect the smallest and largest observations, and individual "x" marks to plot the

Standard Deviation

The square root of the variance, denoted s, which is calculated by adding all of the deviations squared and multiplying it by one over one less than the number of total observations. It is a nonresistant measure of the spread about the mean and should be

Variance

The average of the deviations squared, denoted s squared.

Bar Graph

A bar-based, graphical display of data that shows categorical variables, whose measurements do not necessarily add to one hundred percent.

Pie Chart

A circular, graphical display of data that show categorical variables, whose measurements must add to one hundred percent.

Exploratory Data Analysis

The use of graphs or other numerical summaries to describe the variables in a data set and the relations among them.

Count

The number or percentage of individuals that fall within each category of a particular data display.