Statistics - Organizing Quantitative Data (Section 2.2)

Organize Discrete Data in Tables

We use the values of a discrete variable to create the classes (categories of data)C when the number of distinct data values is small.

Creating the Frequency + Relative Frequency Distributions

We tally the number of observations for each category, count each tally, and create the frequency and relative frequency distributions).

Class

(A definition is a category of data)

Histograms

(Bars of equal width touching one another displaying data values vs. frequency or relative frequency).

Remark Re: Class

Most classes have an upper and lower "class limit.

Class Width

(The difference between consecutive lower class limits is called the class width).

Overlap?

There is no overlap within the classes of a histogram.
Avoids confusion as to which class a data value belongs.

Start with...

Start with a "convenient" number equal to or lower than the smallest data entry. Then choose a reasonable and logical number of classes (between 5 and 20).

Class Width

Class width ~~ largest entry-smallest entry/# of classes. (35-25 = 10). You can round up the class width.

Histograms vs. Bar Graphs

Histograms are for quantitative the way bar graphs are for qualitative.

Intervals of Numbers

When a data set consists of a large number of different discrete data values or when a data set consists of continuous data, then no such predetermined classes exist. Therefore the classes must be created by using the intervals of numbers.

Typical Frequency Distribution Created from Continuous Data

Ages between 25-74 (25-34, 35-44, 45-54, etc).
The data is categorized, or grouped, by intervals of numbers. Each interval represents a class.
Lower class limit is smallest value within each class (25) and upper class limit is largest value in each class

Open Ended

First class has no lower class limit or the last class does not have an upper class limit (60 or older).

Note*

Constructing frequency distribution is somewhat of an art form. The distribution that seems to provide the best overall summary of the data is the one that should be used.

Stem and Leaf Plot - 1

It's another way to represent quantitative data graphically.

Stem and Leaf Plot - 2

Digits left of the most right digit = stem.
Rightmost digit = leaf.
Data value is 147 (stem = 14 and leaf = 7)

Stems

Stems appear vertically in increasing order.

Leaves

Leaves appear to the right of corresponding stems separated by a vertical line.
Leaves are written L->R in increasing order.

Stems = Class

...

Leaves = Frequency

...

Sideways Histograms

In this way, stem-and-leaf plots are similar to "sideways histograms.

Remark:

Actual data entries appear on stem-and-leaf plots. In that sense, the data is not "lost.

Split Stems

To increase the "classes" and better represent the data, you can use "slit stems.

Recall:

Distribution is a representation of the shape or spread of the data. Picture of data (pie chart, stem leaf plot, etc).

Symmetric Distribution - 1

Uniform - Frequencies are evenly spread.

Symmetric Distribution - 2

Bell - Shaped - Highest frequency appears in the middle.
Normal distributions have a bell shape.

Symmetric Distribution - 3

Skewed - Left or right. Highest frequency appears on the left or right side.