A standard bar plot can be a very useful tool, but it is often conveying relatively little information–how one variable varies across some grouping variable. The “data-ink ratio” of such a plot is pretty low. This page will show how to build up from the basic bar plot in R, adding another categorical separation to the summary, confidence intervals to the bars, and labels to the bars themselves.

We will use the **hsb2** dataset, looking at mean values of **math** by
**ses**, then by **ses** and **female**.

## The basic bar plot

We can construct the basic bar plot using the **barplot** function in base
R. We will include labels on the bars and scale the y axis based on the summary
values.

hsb2 <- read.table('http://www.ats.ucla.edu/stat/r/faq/hsb2.csv', header=T, sep=",") attach(hsb2) sesmeans <- tapply(math, ses, mean) sesmeans1 2 3 49.17021 52.21053 56.17241barplot(sesmeans, main = "Math by SES", xlab = "SES", ylab = "Mean Math Score", ylim = c(0, 60), names.arg = c("Low", "Mid", "High"))

## Adding another grouping variable

We are currently summarizing our data by **SES**. We might be interested in separating the observations by
**SES** and **female**. We can create a table of the means of **math**
by these two variables.

femaleses = tapply(math, list(as.factor(ses), as.factor(female)), mean) femaleses0 1 1 47.60000 49.90625 2 53.46809 50.97917 3 54.86207 57.48276

Again we can use barplot for this data. If we have three rows and two columns
in the “height” matrix we provide, we can indicate **beside = TRUE** to
create grouped bars. The number of bars per group will be the number of columns
and the number of grouped bars will be the number of rows. We can see that
transposing **femaleses** changes the grouping of the bars.

par(mfrow = c(1, 2)) barplot(femaleses, beside = TRUE) barplot(t(femaleses), beside = TRUE)

We can add labels and a legend with the code below. We will also specify different colors.

par(mfrow = c(1,1)) barplot(femaleses, beside = TRUE,, main = "Math by SES and gender", col = c("red", "green", "blue"), xlab = "Gender", names = c("Male", "Female"), ylab = "Mean Math Score", legend = c("Low", "Medium", "High"), args.legend = list(title = "SES", x = "topright", cex = .7), ylim = c(0, 90))

## Labeling bars with values

While the levels of the bars indicate which groups have relatively high or low
means, we might wish to add the actual mean values to the plot. We can add text
to the plot so that the means are printed on the bars. To do this, we will
define an object with our bar plot that will be a matrix of the x locations of
the bars. Then, we will use the **text** function to position the heights of
the bars (rounded to one decimal) at these x locations and we let y = 0. With **
pos=3**, we describe that we want the text to be placed above the indication
locations. We will use lighter
colors for the bars to make this added text more readable.

bp <- barplot(femaleses, beside = TRUE, main = "Math by SES and gender", col = c("lightblue", "mistyrose", "lavender"), xlab = "Gender", names = c("Male", "Female"), ylab = "Mean Math Score", legend = c("Low", "Medium", "High"), args.legend = list(title = "SES", x = "topright", cex = .7), ylim = c(0, 90)) text(bp, 0, round(femaleses, 1),cex=1,pos=3)

## Adding confidence bars

Bar plots are often depicting mean values, but adding some indication of variability can greatly enhance the plot.
The **g****plots** package includes an “enhanced bar plot” function called **
barplot2**. We will use this to add confidence intervals to the plot above.
There is an argument, **plot.ci**, that can be indicated as true and then the
upper and lower cutoffs are passed as additional arguments. We will also turn
the bars sideways, indicating **horiz = TRUE**.

library(gplots) mathsd = tapply(math, list(as.factor(ses), as.factor(female)), sd) upper = femaleses+ 1.96*mathsd lower = femaleses- 1.96*mathsd bp <- barplot2(femaleses, beside = TRUE, horiz = TRUE, names.arg = c("Male", "Female"),plot.ci = TRUE, ci.u = upper, ci.l = lower, col = c("lightblue", "mistyrose", "lightcyan"), xlim = c(0, 110), legend = c("Low", "Mid", "High"),main = c("Mean math scores by SES and gender")) text(0,bp,round(femaleses, 1),cex=1,pos=4)