Statistics Chapter 2

Distribution

A distribution is a way to describe the structure of a particular data set or population.

Frequency distribution

Display of the values that occur in a data set and how often each value, or range of values, occurs. The objective is to provide an overview of the data.

Probability distribution

A theoretical distribution used to predict the probabilities of particular data values occurring in a population.

Two basic types of frequency distributions

Group distributions
Ungrouped distributions

Ungrouped frequency distribution

A frequency distribution where each category represents a single value and its frequencies (f), or counts of data values, are listed for each category.
Letter grades (A, B, C, D, F). Each data value has its own category or class. It would be strange to gr

Frequencies

The numbers/counts of data values in the categories of a frequency distribution.

Class

A category of data in a frequency distribution

Grouped frequency distribution

Data are often grouped into ranges of values. The classes are ranges of possible values. Grouped distributions are more common and take more skill to create.

Steps for constructing a frequency distribution

1. Decide how many classes should be in the distribution: there are typically between 5 and 20 classes in a frequency distribution. Several different methods can be used to determine the number of classes that will show the data most clearly.
2. Choose an

Finding the class width

The difference between the lower limits or upper limits of two consecutive classes of a frequency distribution.
Begin by subtracting the lowest number in the data set from the highest number in the data set and dividing the difference by the number of cla

Lower class limit

The smallest number that can belong to a particular class. Using the minimum data value, or a smaller number, as the lower limit of the first class is a good place to begin. you should choose the 1st lower limit so that reasonable classes will be produced

Upper class limit

The largest number that can belong to a particular class. The upper limit of each class is determined so that they do not overlap.

Class boundaries

Split the difference in the gap between the upper limit of one class and the lower limit of the next class. The value that lies halfway between the upper limit of one class and the lower limit of the next class.
To find a class boundary, add the upper lim

Midpoint

The midpoint, or class mark, of a class is the sum of the lower and upper limits of the class divided by 2. The midpoints are often used for estimating the average value in each class.
Class midpoint = (Lower limit + Upper Limit) / 2
Once you find the fir

Relative frequency

Is the fraction or percentage of the data set that falls into a particular class. It is calculated by dividing the class frequency by the sample size. Useful because fractions or percentages make it easier to quickly analyze the data set as a whole.
Relat

Sample size

n= sample size
The sample size for a frequency distribution can be found by adding all of the class frequencies together

Cumulative frequency

The sum of frequencies of a given class and all previous classes. The cumulative frequency of the last class equals the sample size.

Purpose of graphs

Graphs have several advantages over other forms of data display such as lists, ordered arrays, texts, or tables.
-Graphs convey information immediately.
-Graphs can have more impact than text, lists or tables.
-Graphs are persuasive.
-Graphs can often bri

Characteristics of a good graph

A good graph should be able to stand alone.
-A title is important- should describe topic
-Legend
-Labels and scales
-Source should be included

Pie charts

Useful for displaying a frequency table where the x variable is discrete. A pie chart shows how large each category is in relation to the whole; it is created from a frequency distribution by using the RELATIVE FREQUENCIES. The size, or central angle, of

Bar graphs

Another way to display QUALITATIVE data. Bar graphs are used to represent categorical data. The height of the bar represents the amount of data in that category. The horizontal axis contains the qualitative categories, and the vertical axis represents the

Pareto chart

The bars from largest to smallest (descending order). Pareto charts are typically used with NOMINAL data. The reason for this is that if a Pareto chart were created from ordinal or quantitative data, the values on the x-axis might seem out of order after

Side-by-side graph

Used when we want to create a bar graph that compares different groups. To do so, create a bar for each class and for each category. Identify the bars in some way, such as different colors, to denote which bars represent a given class. In this type of gra

Stacked bar graph

Similar to a side-by-side graph but the data is stacked instead of side by side. Allows the reader to view different groups in a category as one in order to make comparisons between the categorical data.

Line graph

A line graph is used to show specific trends in data, normally over time, that show how two variables are related to one another. To construct a line graph, the x-axis will represent the independent variable in the data given and the y-axis will represent

Frequency histogram

A bar graph of a frequency distribution.
To construct a frequency histogram:
1. Find the class boundaries of the frequency distribution.
2. Mark the class boundaries of every class on the horizontal axis, which is a real number line.
3. The width of the b

Relative frequency histogram

There are times at which it is beneficial to display the relative frequency of a distribution. A relative frequency histogram is identical to a regular histogram, except that the heights of the bars represent the relative frequencies of each class rather

Frequency polygon

Using the class midpoints, we can also construct what is called a frequency polygon. A frequency polygon is a visual display of the frequencies of each class using the midpoints from the histogram.
Steps for constructing a frequency polygon:
1. Mark the c

Ogive

An ogive is another type of line graph which depicts cumulative frequency of each class from a frequency table. Begin by tabulating the cumulative frequencies for each class. Unlike creating a frequency polygon, we only include an extra class at the lower

Stem and leaf plots

Retains the original data. The leaves are usually the last digit in each data value and the stems are the remaining digits. For example, in the number 189, 9 is the leaf and 18 is the stem. Be sure to include a legend.
Steps for creating a stem and leaf p

Dot plots

Displays the data without grouping certain points together like a stem and leaf plot does. Instead, only data which are exactly the same appear together. As such, these plots are useful for identifying extreme values and clusters in data sets. Because a d

Analyzing a graph

When you are analyzing a graph, you are first trying to determine the overall pattern of the data. Is it symmetrical, but not uniform? Does the majority of the data lie to one side or the other? Is the frequency the same for all categories?
We also need t

Basic shapes of a distribution

1. Uniform
2. Symmetrical, but not uniform
3. Skewed to the right
4. Skewed to the left

Uniform distribution

The frequency of each class is relatively the same. The distribution will have a RECTANGULAR shape.

Symmetrical, but not uniform distribution

The data lies evenly on both sides of the distribution. The right and left side of the curve, histogram, etc., are mirror images of each other.

Skewed to the right

The majority of the data falls on the left of the distribution. Also, the right side of the distribution will extend father out than the extension on the left side.
The definitions seem backward, thats because the names are based on what happens to the me

Skewed to the left

The majority of the data falls on the right of the distribution. Also, the left side of the distribution will extend farther out than the extension on the right side.
The definitions seem backward, thats because the names are based on what happens to the

Time series graph

A picture of how data changes over time and has a variable of time as the horizontal axis.
Ex: Consumer price index between the years 1920 and 1990.
Common types: line graph
Time series study: A historian gathers data over the past hundred years to determ

Cross-sectional graph

Picture of the data at a given moment in time. Neither axis will have a variable of time in the case.

Line graphs

Depict the change in value over time. Constructed by joining data points in order with the line segments.