BUS 3507 Chapter 2 Organizing and Chart

Main Page => Notes & Assignment

2.1 Raw Data
When data are collected, the information obtained from each member of a population or sample is recorded in the sequence in which it becomes available. This sequence of data recording is random and unranked. Such data, before they are grouped or ranked, are called raw data.
Definition of Raw Data

Data  recorded  in  the  sequence  in  which  they  are  collected  and  before  they  are processed or ranked are called raw data.

Suppose we collect information on the ages (in years) of 50 students selected from a university. The data values, in the order they are collected, are recorded in Table 2.1. For instance, the first student’s age is 21, the second student’s age is 19 (second number in the first row), and so forth. The data in Table 2.1 are quantitative raw data.

Table 2.1 Ages of 50 Students
21 19 24 25 29 34 26 27 37 33 18 20 19 22 19 19 25 22 25 23 25 19 31 19 23 18 23 19 23 26 22 28 21 20 22 22 21 20 19 21 25 23 18 37 27 23 21 25 21 24

Suppose we ask the same 50 students about their student status. The responses of the students are recorded in Table 2.2. In this table, F, SO, J, and SE are the abbreviations for freshman, sophomore, junior, and senior, respectively. This is an example of qualitative (or categorical) raw data.

Table 2.2 Status of 50 Students
J F SO SE J J SE J J J F F J F F F SE SO SE J J F SE SO SO F J F SE SE SO SE J SO SO J J SO F SO SE SE F SE J SO F J SO SO

The data presented in Tables 2.1 and 2.2 are also called ungrouped data. An ungrouped data set contains information on each member of a sample or population individually.

2.2 Organizing and Graphing Qualitative Data 
This section discusses how to organize and display qualitative (or categorical) data. Data sets are organized into tables, and data are displayed using graphs.

2.2.1 Frequency Distributions 
A sample of 100 students enrolled at a university were asked what they intended to do after graduation.  Forty-four said they wanted to work for private companies/businesses, 16 said they wanted to work for the federal government, 23 wanted to work for state or local governments, and 17 intended to start their own businesses. Table 2.3 lists the types of employment and the number of students who intend to engage in each type of employment. In this table, the variable is the type of employment, which is a qualitative variable. The categories (representing the  type  of  employment)  listed  in  the  first  column  are  mutually  exclusive.  In other words, each of the 100 students belongs to one and only one of these categories. The number of students who belong to a certain category is called the frequency of that category. A frequency distribution exhibits how the frequencies are distributed over various categories. Table 2.3 is called a frequency distribution table or simply a frequency table.

Table 2.3 Type of Employment Students Intend to Engage In
Type of Employment  Number of Students
Private companies/businesses 44
Federal government 16
State/local government 23
Own business 17
Sum = 100

Definition of Frequency Distribution for Qualitative Data

A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories.


Example 2-1
A sample of 30 employees from large companies was selected, and these employees were asked how stressful their jobs were. The responses of these employees are recorded below where very represents very stressful, somewhat means somewhat stressful, and none stand for not stressful at all.

somewhat none  somewhat very  very  none
very  somewhat  somewhat very  somewhat somewhat
very  somewhat  none   very   none   somewhat
somewhat very   somewhat somewhat  very   none
somewhat very  very  somewhat  none   somewhat

Construct a frequency distribution table for these data. 
Solution 
Note  that  the  variable  in  this  example  is  how  stressful  is  an  employee’s  job. This variable is classified into three categories: very stressful, somewhat stressful, and not stressful at all. We record these categories in the first column of Table 2.4. Then we read each employee’s response from the given data and mark a tally, denoted by the symbol |, in the second column of Table 2.4 next to the corresponding category. For example, the first employee’s response is that his or her job is somewhat stressful. We show this in the frequency table by marking a tally in the second column next to the category somewhat. Note that the tallies are marked in blocks of five for counting convenience. Finally, we record the total of the tallies for each category in the third column of the table. This column is called the column of frequencies and is usually denoted by f. The sum of the entries in the frequency column gives the sample size or total frequency. In Table 2.4, this total is 30, which is the sample size.

Table 2.4 Frequency Distribution of Stress on Job
Stress on Job Tally Frequency (f) Very |||||  ||||| 10 Somewhat |||||  |||||  |||| 14 None |||||  | 6   Sum = 30


2.2.2 Relative Frequency and Percentage Distributions 
The relative frequency of a category is obtained by dividing the frequency of that category by the sum of all frequencies. Thus, the relative frequency shows what fractional part or proportion of the total frequency belongs to the corresponding category. A relative frequency distribution lists the relative frequencies for all categories.

Calculating Relative Frequency of a Category

Relative frequency of a category = Frequency of data / Sum of all frequency

The percentage for a category is obtained by multiplying the relative frequency of that category by 100. A percentage distribution lists the percentages for all categories.

Calculating Percentage
Percentage = (Relative frequency) x 100 

Example 2-2
Determine the relative frequency and percentage distributions for the data of Table 2.4. 
Solution 
The relative frequencies and percentages from Table 2.4 are calculated and listed in Table 2.5. Based on this table, we can state that 0.333 or 33.3% of the employees said that their jobs are very stressful.  By  adding  the  percentages  for  the  first  two  categories, we  can  state that 80% of the employees said that their jobs are very or somewhat stressful. The other numbers in Table 2.5 can be interpreted the same way. 
Notice that the sum of the relative frequencies is always 1.00 (or approximately 1.00 if the  relative  frequencies  are  rounded), and  the  sum  of  the  percentages  is  always  100  (or  approximately 100 if the percentages are rounded).

Table 2.5 Relative Frequency and Percentage Distributions of Stress on Job
Stress on Job Relative Frequency Percentage Very 10/30 = 0.333 0.333 x 100 = 33.3 Somewhat 14/30 = 0.467 0.467 x 100 = 46.7 None 6/30 = 0.200 0.200 x 100 = 20.0  Sum = 1.000 Sum = 100



2.2.3 Graphical Presentation of Qualitative Data
All of us have heard the adage “a picture is worth a thousand words.” A graphic display can re- veal at a glance the main characteristics of a data set. The bar graph and the pie chart are two types of graphs used to display qualitative data.

Bar Graphs 
To construct a bar graph (also called a bar chart), we mark the various categories on the horizontal axis as in Figure 2.1. Note that all categories are represented by intervals of the same width. We mark the frequencies on the vertical axis. Then we draw one bar for each category such that the height of the bar represents the frequency of the corresponding category. We leave a small gap between adjacent bars. Figure 2.1 gives the bar graph for the frequency distribution of Table 2.4.

Figure 2.1 Bar graph for the frequency distribution of Table 2.4

Definition of Bar Graph
A  graph  made  of  bars  whose  heights  represent  the  frequencies  of  respective categories is called a bar graph.

The bar graphs for relative frequency and percentage distributions can be drawn simply by marking the relative frequencies or percentages, instead of the class frequencies, on the vertical axis. Sometimes  a  bar  graph  is  constructed  by  marking  the  categories  on  the  vertical  axis  and the frequencies on the horizontal axis.

BUS 3507        BUSINESS STATISTICS
6

Pie Charts 
A pie chart is more commonly used to display percentages, although it can be used to display frequencies or relative frequencies. The whole pie (or circle) represents the total sample or population. Then we divide the pie into different portions that represent the different categories.
Definition of Pie Chart

A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories is called a pie chart.


As we know, a circle contains 360 degrees. To construct a pie chart, we multiply 360 by the relative frequency of each category to obtain the degree measure or size of the angle for the corresponding category. Table 2.6 shows the calculation of angle sizes for the various categories of Table 2.5.
Table 2.6 Calculating Angle Sizes for the Pie Chart
Stress on Job Relative Frequency Angle Size Very 10/30 = 0.333 360 x 0.333 = 119.88 Somewhat 14/30 = 0.467 360 x 0.467 = 168.12 None 6/30 = 0.200 360 x 0.200 = 72.00  Sum = 1.000 Sum  = 360

Figure 2.1 shows the pie chart for the percentage distribution of Table 2.5, which uses the angle sizes calculated in Table 2.6.
Figure 2.2 Pie chart for the percentage distribution of Table 2.5.



2.3 Organizing and Graphing Quantitative Data
In the previous section we learned how to group and display qualitative data. This section explains how to group and display quantitative data.

2.3.1 Frequency Distributions
Table  2.7  gives  the  weekly  earnings  of  100  employees  of  a  large  company.  The  first  column lists  the  classes, which  represent  the  (quantitative)  variable  weekly  earnings.  For quantitative data, an interval that includes all the values that fall within two numbers, the lower and upper limits, is called a class. Note that the classes always represent a variable. As we can observe, the  classes  are  nonoverlapping;  that  is, each  value  on  earnings  belongs  to  one  and  only  one class. The second column in the table lists the number of employees who have earnings within each class.  For  example, nine  employees  of  this  company  earn  $401  to  $600  per  week.  The numbers listed in the second column are called the frequencies, which give the number of values that belong to different classes. The frequencies are denoted by f.

Table 2.7 Weekly Earnings of 100 Employees of a Company 

Weekly Earnings (dollars) Number of Employees (f) 
401 to 600 = 9 
601 to 800 = 22 
801 to 1000 = 39 
1001 to 1200 = 15 
1201 to 1400 = 9 
1401 to 1600 = 6

For quantitative data, the frequency of a class represents the number of values in the data set that fall in that class. Table 2.7 contains six classes. Each class has a lower limit and an upper limit. The values 401, 601, 801, 1001, 1201, and 1401 give the lower limits, and the values 600, 800, 1000, 1200, 1400, and 1600 are the upper limits of the six classes, respectively. The data presented in Table 2.7 are an illustration of a frequency distribution table for quantitative data. Whereas the data that list individual values are called ungrouped data, the data presented in a frequency distribution table are called grouped data.

Definition of Frequency Distribution for Quantitative Data
A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. Data presented in the form of a frequency distribution are called grouped data.


To find the midpoint of the upper limit of the first class and the lower limit of the second class in Table 2.7, we divide the sum of these two limits by 2. Thus, this midpoint is
      
600 + 601 / 2 = 600.5
      
The value 600.5 is called the upper boundary of the first class and the lower boundary of the second class.  By  using  this  technique, we  can  convert  the  class  limits  of  Table  2.7  to  class boundaries, which  are  also  called  real  class  limits. The second column of Table 2.8 lists the boundaries for Table 2.7.

Definition of Class Boundary
The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class.

The difference between the two boundaries of a class gives the class width. The class width is also called the class size.
Finding Class Width
Class width = Upper boundary - Lower boundary
Thus, in Table 2.8,
Width of the first class = 600.5 - 400.5 = 200

The  class  widths  for  the  frequency  distribution  of  Table  2.7  are  listed  in  the  third column  of Table 2.8. Each class in Table 2.8 (and Table 2.7) has the same width of 200. 

The class midpoint or mark is obtained by dividing the sum of the two limits (or the two boundaries) of a class by 2.

Calculating Class Midpoint or Mark

Class Midpoint or Mark = (Lower Limit + Upper Limit ) / 2
                        
Thus, the midpoint of the first class in Table 2.7 or Table 2.8 is calculated as follows:

Midpoint of the 1st class = ( 401 + 600 / 2 ) = 500.5

The class midpoints for the frequency distribution of Table 2.7 are listed in the fourth column of Table 2.8.

Table 2.8 Class Boundaries, Class Widths, and Class Midpoints for Table 2.7
Class Limits Class Boundaries Class Width Class Midpoint 
401 to 600 400.5 to less than 600.5 200 500.5 601 to 800 600.5 to less than 800.5 200 700.5 801 to 1000 800.5 to less than 1000.5 200 900.5 1001 to 1200 1000.5 to less than 1200.5 200 1100.5 1201 to 1400 1200.5 to less than 1400.5 200 1300.5 1401 to 1600 1400.5 to less than 1600.5 200 1500.5

Note  that  in  Table  2.8, when  we  write  classes  using  class  boundaries, we  write  to  less than to ensure that each value belongs to one and only one class. As we can see, the upper boundary of the preceding class and the lower boundary of the succeeding class are the same.

2.3.2 Constructing Frequency Distribution Tables
When constructing a frequency distribution table, we need to make the following three major decisions.
Number of Classes 
Usually the number of classes for a frequency distribution table varies from 5 to 20, depending mainly on the number of observations in the data set. It is preferable to have more classes as the size of a data set increases. The decision about the number of classes is arbitrarily made by the data organizer.

Class Width 
Although it is not uncommon to have classes of different sizes, most of the time it is preferable to have the same width for all classes. To determine the class width when all classes are the same size, first find the difference between the largest and the smallest values in the data. Then, the approximate width of a class is obtained by dividing this difference by the number of desired classes.
Calculation of Class Width

Approximate class width = ( Largest Value - Smallest Value ) / Number of Classes

Usually this approximate class width is rounded to a convenient number, which is then used as the class width. Note that rounding this number may slightly change the number of classes initially intended.

Lower Limit of the First Class or the Starting Point 
Any  convenient  number  that  is  equal  to  or  less  than  the  smallest  value  in  the  data  set  can  be used as the lower limit of the first class. 
Example  2.3  illustrates  the  procedure  for  constructing  a  frequency  distribution  table  for quantitative data.

Example 2-3
Table 2.9 (on next page) gives the total home runs hit by all players of each of the 30 Major League Baseball teams during the 2004 season. Construct a frequency distribution table.

Solution 
In these data, the minimum value is 135 and the maximum value is 242. Suppose we decide to group these data using five classes of equal width. Then,

Approximate width of each class = (242 - 135) / 5 = 21.4

Now we round this approximate width to a convenient number—say, 22. The lower limit of the first class can be taken as 135 or any number less than 135. Suppose we take 135 as the lower limit of the first class. Then our classes will be

 135–156,  157–178,  179–200,  201–222,  and  223–244  

We record these five classes in the first column of Table 2.10 on page 12

One rule to help decide on the number of classes is Sturge’s formula:

c = 1 + 3.3 log n 

where c is the number of classes and n is the number of observations in the data set. The value of log n can be obtained by entering the value of n on the calculator and pressing the log key.

Table 2.9 Home Runs Hit by Major League Baseball Teams During the 2004 Season
Team                         Home Runs       Team        Home Runs 
Arizona                     135    Milwaukee    135 
Atlanta                      178    Minnesota     191 
Baltimore                  169    Montreal (now Washington)  151 
Boston                       222    New York Mets   185 
Chicago Cubs            235    New York Yankees   242 
Chicago White Sox    242    Oakland    189 
Cincinnati                  194    Philadelphia    215 
Cleveland                   184    Pittsburgh    142 
Colorado                    202    St. Louis    214 
Detroit                        201    San Diego    139 
Florida                       148    San Francisco    183 
Houston                     187    Seattle    136 
Kansas City               150    Tampa Bay    145

Now we read each value from the given data and mark a tally in the second column of Table 2.10 next to the corresponding class. The first value in our original data is 135, which belongs to the 135–156 class. To  record  it, we  mark  a  tally  in  the  second  column  next  to the 135–156 class. We continue this process until all the data values have been read and entered in the tally column. Note that tallies are marked in blocks of fives for counting convenience. After the tally column is completed, we count the tally marks for each class and write those numbers in the third column. This gives the column of frequencies. These frequencies represent the number of teams that belong to each of the five different classes representing the total home runs. For example, 10 of the 30 Major League Baseball teams hit a total of 135–156 home runs during the 2004 season.



Table 2.10 Frequency Distribution for the Data of Table 2.9
Total Home Runs Tally Frequency (f) 135 – 156 |||||  ||||| 10 157 – 178 ||| 3 179 – 200 |||||  || 7 201 – 222 |||||  | 6 223 – 244 |||| 4   Sum = 30

In Table 2.10, we can denote the frequencies of the five classes by f1,f2,f3,f4 &f5 respectively. Therefore, 
f1= Frequency of the first class = 10
Similarly,
Hence, the sum of the frequencies of all classes
= f1 + f2 + f3 + f4 + f5
= 10 + 3 + 7 + 6 + 4
= 30

The number of observations in a sample is usually denoted by n. The number of observations in a population is denoted by N. Consequently, summation of f is equal to N for population data.  Because  the  data  set  on  the  total  home  runs  by  Major League Baseball teams in Table 2.10 is for all 30 teams, it represents the population.

Note  that  when  we  present  the  data  in  the  form  of  a  frequency  distribution  table, as  in Table  2.10, we  lose  the  information  on  individual  observations.  We cannot know the exact number of home runs hit by any particular Major League Baseball team from Table 2.10. All we know is that the home runs hit by 10 of these teams during the 2004 season are between 135 - 156, and so forth.

2.3.3 Relative Frequency and Percentage Distributions
Using Table 2.10, we can compute the relative frequency and percentage distributions the same way we did for qualitative data in Section 2.2.2. The relative frequencies and percentages for a quantitative data set are obtained as follows.

 Relative frequency of a category = frequency of that category / sum of all frequencies

Percentage = (Relative frequency) x 100 

Example 2.4 illustrates how to construct relative frequency and percentage distributions.

Example 2-4 
Calculate the relative frequencies and percentages for Table 2.10.

Solution 
The relative frequencies and percentages for the data in Table 2.10 are calculated and listed in the third and fourth columns, respectively, of Table 2.11 here. Note that the class boundaries are listed in the second column of Table 2.11.

Table 2.11 Relative Frequency and Percentage Distributions for Table 2.10
      Total Home Runs           Class Boundaries           Relative Frequency      Percentage 
135–156            134.5 to less than 156.5              0.333                      33.3 
157–178   156.5 to less than 178.5   0.100    10.0 
179–200   178.5 to less than 200.5   0.233    23.3 
201–222   200.5 to less than 222.5   0.200    20.0 
223–244   222.5 to less than 244.5   0.133    13.3 
               Sum = 0.999    Sum = 99.9%

Using Table 2.11, we can make statements about the percentage of teams with home runs within a certain interval. For example, 33.3% of the Major League Baseball teams in this population hit total home runs between 135–156 during the 2004 season. By adding the percentages  for  the  first  two  classes, we  can  state  that  about  43.3%  of  these  teams  hit  home  runs between 135–178 during the 2004 season. Similarly, by adding the percentages of the last two classes, we can state that about 33.3% of these teams hit home runs between 201– 244 during the 2004 season.


2.3.4 Graphing Grouped Data
Grouped  (quantitative)  data  can  be  displayed  in  a  histogram or  a  polygon. This section describes how to construct such graphs. We can also draw a pie chart to display the percentage distribution for a quantitative data set. The procedure to construct a pie chart is similar to the one for qualitative data explained in Section 2.2.3; it will not be repeated in this section.

Histograms 
A histogram can be drawn for a frequency distribution, a relative frequency distribution, or a percentage distribution. To draw a histogram, we first mark classes on the horizontal axis and frequencies (or relative frequencies or percentages) on the vertical axis.  Next, we draw a bar for each class so that its height represents the frequency of that class. The bars in a histogram are drawn adjacent to each other with no gap between them. A histogram is called a frequency histogram, a relative frequency histogram, or a percentage histogram depending on whether frequencies, relative frequencies, or percentages are marked on the vertical axis.

Definition of Histogram
A histogram is a graph in which classes are marked on the horizontal axis and the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative  frequencies, or  percentages  are  represented  by  the  heights  of  the  bars.  In a histogram, the bars are drawn adjacent to each other.

Figures  2.3  and  2.4  show  the  frequency  and  the  relative  frequency  histograms, respectively, for the data of Tables 2.10 (page 12) and 2.11 (page 13) of Sections 2.3.2 and 2.3.3. The two histograms look alike because they represent the same data.  A percentage histogram can be drawn for the percentage distribution of Table 2.11 by marking the percentages on the vertical axis. 

The symbol –//– used in the horizontal axes of Figures 2.3 and 2.4 represents a break, called the truncation, in the horizontal axis. It indicates that the entire horizontal axis is not shown in these figures. Notice that the 0 to 134.5 portion of the horizontal axis has been omitted in each figure.



Figure 2.3 Frequency histogram for Table 2.10.

Figure 2.4 Relative frequency histogram for Table 2.11.

Polygons 
A  polygon is  another  device  that  can  be  used  to  present  quantitative  data  in  graphic  form. To  draw  a  frequency  polygon, we  first  mark  a  dot  above  the  midpoint  of  each  class  at  a height equal to the frequency of that class. This is the same as marking the midpoint at the top of each bar in a histogram. Next we mark two more classes, one at each end, and mark their midpoints. Note that these two classes have zero frequencies. In the last step, we join the adjacent dots with straight lines. The resulting line graph is called a frequency polygon or simply a polygon. 

A  polygon  with  relative  frequencies  marked  on  the  vertical  axis  is  called  a  relative  frequency polygon. Similarly, a polygon with percentages marked on the vertical axis is called a percentage polygon.

Definition of Polygon
A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon.


Figure 2.5 shows the frequency polygon for the frequency distribution of Table 2.10.

Figure 2.5 Frequency polygon for Table 2.10.

2.5 Cumulative Frequency Distributions
Consider again Example 2–3 of Section 2.3.2 about the home runs hit by Major League Base- ball teams. Suppose we want to know how many teams hit a total of 200 or fewer home runs during the 2004 season.  Such a question can be answered using a cumulative frequency distribution. Each class in a cumulative frequency distribution table gives the total number of values that fall below a certain value. A cumulative frequency distribution is constructed for quantitative data only.

Definition of Cumulative Frequency Distribution
A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class.

In  a  cumulative  frequency  distribution  table, each  class  has  the  same  lower  limit  but  a different upper limit. Example 2–5 illustrates the procedure to prepare a cumulative frequency distribution.

Example 2-5
Using the frequency distribution of Table 2.10, reproduced here, prepare a cumulative frequency distribution for the home runs hit by Major League Baseball teams during the 2004 season.
Total Home Runs f
135–156             10
157–178   3
179–200   7
201–222   6
223–244   4
Solution 
Table  2.12  gives  the  cumulative  frequency  distribution  for  the  home  runs  hit  by Major League Baseball teams. As we can observe, 135 (which is the lower limit of the first class in Table 2.10) is taken as the lower limit of each class in Table 2.12. The upper limits of all classes in Table 2.12 are the same as those in Table 2.10. To obtain the cumulative frequency of a class, we add the frequency of that class in Table 2.10 to the frequencies of all preceding classes. The cumulative frequencies are recorded in the third column of Table 2.12. The second column of this table lists the class boundaries.

Table 2.12 Cumulative Frequency Distribution of Home Runs by Baseball Teams
Class Limits   Class Boundaries    Cumulative Frequency 
135–156   134.5 to less than 156.5   10
135–178   134.5 to less than 178.5   10 + 3 = 13
135–200   134.5 to less than 200.5   10 + 3 + 7 = 20
135–222   134.5 to less than 222.5  10 + 3 + 7 + 6 = 26
135–244   134.5 to less than 244.5   10 + 3 + 7 + 6 + 4 = 30
BUS 3507        BUSINESS STATISTICS
18

From Table 2.12, we can determine the number of observations that fall below the upper limit or boundary of each class. For example, 20 Major League Baseball teams hit a total of 200 or fewer home runs.
The cumulative relative frequencies are obtained by dividing the cumulative frequencies by the total number of observations in the data set. The cumulative percentages are obtained by multiplying the cumulative relative frequencies by 100.
Calculating Cumulative Relative Frequency and Cumulative Percentage
                              
                                                                

                      (                             )    

Table  2.13  contains  both  the  cumulative  relative  frequencies  and  the  cumulative  percentages  for Table  2.12. We can observe, for example, that 66.7% of the Major League Baseball teams hit 200 or fewer home runs during the 2004 season.

Table 2.15 Cumulative Relative Frequency and Cumulative Percentage Distributions for Home Runs Hit by Baseball Teams
Class Limits   Cumulative Relative Frequency   Cumulative Percentage 
  135–156    33.310/30 = 0.333     33.3
  135–178    43.313/30 = 0.433     43.3 
  135–200    66.720/30 = 0.667     66.7
  135–222    86.726/30 = 0.867     86.7
  135–244    100.030/30 = 1.000    100.0



BUS 3507        BUSINESS STATISTICS
19

Ogives 
When plotted on a diagram, the cumulative frequencies give a curve that is called an ogive. Figure 2.6 gives an ogive for the cumulative frequency distribution of Table 2.12. To draw the ogive in Figure 2.6, the variable, which is total home runs, is marked on the horizontal axis and the cumulative frequencies on the vertical axis. Then the dots are marked above the upper boundaries of various classes at the heights equal to the corresponding cumulative frequencies. The  ogive  is  obtained  by  joining  consecutive  points  with  straight  lines.  Note that the ogive starts at the lower boundary of the first class and ends at the upper boundary of the last class.

Figure 2.6 Ogive for the cumulative frequency distribution of Table 2.12
Definition of Ogive An  ogive is  a  curve  drawn  for  the  cumulative  frequency  distribution  by  joining  with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes.


One advantage of an ogive is that it can be used to approximate the cumulative frequency for any interval. For example, we can use Figure 2.6 to find the number of Major League Baseball teams with 188 or fewer home runs. First, draw a vertical line from 188 on the horizontal axis up to the ogive. Then draw a horizontal line from the point where this line intersects the ogive to the vertical axis. This point gives the cumulative frequency of the class 135–188.  In Figure 2.6, this cumulative frequency is (approximately) 16 as shown by the dashed line. Therefore, 16 baseball teams had 188 or fewer home runs during the 2004 season. 
 We can draw an ogive for cumulative relative frequency and cumulative percentage distributions the same way we did for the cumulative frequency distribution.