2.1.2 Box Plots
The box plot shows three main features of a variable: its center, its
spread, and its outliers. A box plot is made up of a box (a rectangle) with
various lines and points added to it. The box plot yielded by GRAPHID has the
following features.
- The base of the rectangle is proportional to the number of
cases, and the lower and upper boundaries of the box show the lower and
upper quartiles respectively. The length of the box is thus equal to the
inter-quartile range (IQR), which is a convenient and popular measure of
the spread.
- For each variable, a set of boxes are plotted, one for each
group. For example, with IDAMS dataset ANJU.DAT, we can simultaneously
examine the activity patterns of academic scientists in eight activities
in four types of institutions– 32 box plots, four each in eight
windows.
- The white line inside the box indicates the mean value and the
green line indicates the median. The distance between the white line and
the green line is an indicator of skewness. Greater the distance between
these lines, greater is the skewness. When the mean and median coincide,
only the white line is shown, and in that case the distribution would be
perfectly symmetric. The mean and median for all the cases together are
shown by the dotted lines.
- The left side of the window shows the scale of the variable.
- For each selected variable, GRAPHID plots a set of boxes, each
corresponding to one group of cases or if no groups are specified, box
plots of eight variables are plotted. The box plots can be zoomed, one at
a time, to visualize their features more clearly.