Category: R

Setting graph margins in R using the par() function and lots of cow milk

It is fairly straightforward to set the margins of a graph in R by calling the par() function with the mar (for margin!) argument. For example,

par(mar=c(5.1,4.1,4.1,2.1)

sets the bottom, left, top and right margins respectively of the plot region in number of lines of text.

Another way is by specifying the margins in inches using the mai argument:

par(mai=c(1.02,0.82,0.82,0.42))

The numbers used above are the default margin settings in R. You can verify this by firing up the R prompt and typing par(“mar”) or par(“mai”). You should get back a vector with the above values. The bottom, left and top margins are the largest because that’s where annotations and titles are most likely to be placed.

Since we can specify margins both in terms of lines of text and inches, let’s find out how high one line of text is by default:

par(“mai”)/par(“mar”)
[1] 0.2 0.2 0.2 0.2

0.2 inches!

There are ways to change this line height but that’s a useful number to keep in mind.

The default size of the figure region is approximately 7 inches wide by 7 inches high. You can verify this by typing par(“fin”) at the R prompt. So, by default the figure is 35 lines high and wide. One way to verify this is by trying to run the following code:

par(mar=c(35,35,0,0))
plot(1:10)

What happens? We get an error saying “figure margins too large”. That was bound to happen because we used up all of the figure region in margins and left no space for the plot to be drawn! You are probably never going to set such large margins, but in my experience errors like that occur when I’m working with multiple plot layouts (using the mfrow argument – I might write a post about that some time).

Margin lines are numbered starting from 0. We already know the number of margin lines from par(“mar”) but let’s make a graph to illustrate this point and see how the margin lines are numbered:

plot(1:10,ann=FALSE,type=”n”,xaxt=”n”,yaxt=”n”)
for(j in 1:4) for(i in 0:10) mtext(as.character(i),side=j,line=i)

In the above example, we used the mtext() function (which as the name suggests places text in the margins) to label the margin lines.

The mgp argument of the par() function is a vector of 3 values which specify the margin line for the axis title, axis labels and axis line. The default value of mgp is c(3,1,0), which means that the axis title is drawn in the fourth line of the margin starting from the plot region, the axis labels are drawn in the second line and the axis line itself is the first line.

Sometimes the axis labels may be very long and overlap with the axis title (for example, large numbers in scientific notation on the y axis). To overcome this we can use par() to first increase the left margin and then use mgp to set the axis title line. Note that using mgp applies the same set of margin values to axes on all four sides. Alternatively, we can suppress the drawing of the default axis label and use mtext() function specifying the line argument to a value higher than default. Let’s look at an interesting example to try this out.

Recently, there was a blog post showing some interesting data about the milk production powers of Wisconsin’s super-efficient cows. So, let’s pick up that data and plot the total milk production for the last few decades.


cows<-read.csv("http://public.tableausoftware.com/vizql028/export/sessions/d87316bf-0:11/views/Dairycowsandmilkproduction19242009_156022745?format=text/csv&")

#This bit of code is to remove the commas in numerical fields in the original dataset. I don't know of any automatic ways to do this in R.
for (i in 1:ncol(cows)){
if(length(grep(",",cows[[i]]))>0)
cows[[i]] <- as.numeric(gsub(",", "", cows[[i]]))
}

plot(cows$Total.milk.production~cows$Year,las=1,xlab="Year", ylab="Total Milk Production (in pounds?)")

Now, you may argue that showing cow milk production in scientific (E) notation is a bit too nerdy, but I think it’s a good enough example here. The Y axis title overlaps the axis labels, making the graph hard to read and a bit ugly. So, let’s fix it by increasing the left margin using mar and placing the axis title in a higher margin line using mgp:


par(mar=c(5,6,4,2)+0.1,mgp=c(5,1,0))
plot(cows$Total.milk.production~cows$Year,las=1,xlab=”Year”, ylab=”Total Milk Production (in pounds?)”)

That looks better, but did you notice we knocked out the X axis title? That happened because as I wrote earlier, mgp applies to both the axes. So we asked par to place the axes titles in line number 5. Since line numbering starts at 0, that’s the sixth line in the margin. But we only left 5 lines worth of margin space on the bottom X axis. So the X axis title did not fit within the figure.

To get around this problem, there are at least three solutions. Let’s first look at the hardest one.


par(mar=c(5,6,4,2)+0.1)
plot(cows$Total.milk.production~cows$Year,las=1,xlab=”Year”, ylab=”")
mtext(“Total Milk Production (in pounds?)”,side=2,line=5)

Voila! So we dropped the mgp argument, set the left margin wide, suppressed the default Y axis label and then used mtext to place the title in line 5. Thus, we used the defaults for the X axis title and used a custom function call for the Y axis title.

What are the easy solutions then? Just use lattice or ggplot2 – they will take care of the margins automatically almost in all cases without you having to worry about it.

If you are wondering why I am wrestling with these base graphics settings, there is a good reason. I’m building a web-based graphing application using R, so I need to automatically and quickly create good looking graphs for a variety of use cases. In my experience, the base graphics functions are faster than using lattice and ggplot2 simply because loading the packages takes a few too many seconds. In building the code for Pretty Graph to handle all sorts of user input data and help people visualise them as different types of graphs, I am having to hack around the base R code to make it produce good graphs consistently. I think that one can make very good graphs using the basic functions if one spends some time learning the different parameters. This is the first in a series of blog posts where I talk about my experience in building R graphs and some interesting quirks of R graphics functions. I hope it will be a good learning experience for me.

Response to Flowingdata Challenge: Graphing obesity trends

Nathan at Flowingata put up another interesting challenge today to improve the following graphic showing obesity trends in America.

Here’s my attempt:

I transposed the data so that the cohorts are on the X axis and each separate line represents an age group. So each line shows the percentage of obese people in a particular age group. This way the graph tells you the probability of you being obese at a given age in a particular decade.

For each age group, the line roughly trends upwards. For example, the lavender (violet? 3rd from bottom) line shows that if you were in your twenties during the World War II chances of you being obese were just below 10%, but if you were in your twenties during the mid 60s-70s (rocking out to The Doors?) you were more than twice as likely to be obese.

So, from a cursory look, it does seem that Americans have been getting obese faster.

For those interested, here’s my modified data file and the R code:

o<-read.csv(“obesity_edit.csv”)

colnames(o)<-c(“Age group”,”2-9 Y”,”10-19 Y”,”20-29 Y”,”30-39 Y”,”40-49 Y”,”50-59 Y”,”60-69 Y”,”70-79 Y”)

library(RColorBrewer)
pal<-brewer.pal(length(colnames(o)),”Set1″)

plot(o[,2],pch=19,xaxt=”n”,col=pal[2],type=”o”,ylim=c(0,max(o[,-1],na.rm=T)),xlab=”Cohort by Decade”,ylab=”Percentage of Obese People”,main=”Obesity trends by Age Group”)

for(i in 3:length(colnames(o))) {
points(o[,i],pch=19,xaxt=”n”,col=pal[i])
lines(o[,i],pch=19,xaxt=”n”,col=pal[i])
}

axis(1,at=1:length(o[,1]),labels=o[,1],cex.axis=0.75)

legend(“topright”,legend=colnames(o)[-1],col=pal[-1],lty=1,pch=19,bty=”n”)

Responding to the Flowingdata GDP Graph Challenge

Nathan Yau of Flowingdata put up a challenge earlier today to improve upon a graph showing government spending as a percentage of GDP, published in the Economist.

The underlying data wasn’t available. So I put on my graph-to-numbers glasses on and pulled out some data. Here it is in case you want to have a go.

I took on the first part of the challenge i.e. Can you think of a way to make this graph easier to read?

The Original Graph from the Economist:

Total government spending as percentage of GDP

I hacked up the following version in R. It was a bit of a challenge to get it right given the constraints of fitting all the data and legend within a 290 x 300 image.

Total government spending as percentage of GDP

Do you think this is an improvement? Leave a comment below or in Nathan’s original post at http://flowingdata.com/2010/02/25/challenge-make-this-graph-easier-to-read/.

And here’s the R code:

#Read the file
gdp<-read.delim(“gdp_long.txt”,header=T)

#Reset the column name from United.States to United States; R replaces spaces in variable names with dots; you’ll see why below.
colnames(gdp)[6]<-”United States”

#Define our colour palette so that we can edit it in one place and refer to colours by index as shown below
pal=c(“black”,”darkorange”,”blue”,”forestgreen”,”tomato”)

#Start PNG device with the given constraints of 300×290 (boy that’s a small image!)
png(“gdp.png”,height=300,width=290,units=”px”)

#Plot settings
par(mar=c(2,2,3,1) #Small images call for small margins
,las=1) #For some reason, the default orientation (las=0) of the axis labels is parallel to the axis. This works OK for the X axis but makes it hard to read Y axis labels, so set to horizontal.

#Finally, the main plot command
plot(Canada~Year,data=gdp
,type=”l”
,xaxt=”n” #Don’t draw default X axis; we’ll draw a custom one below.
,xaxs=”i” #X axis style (internal) just finds an axis with pretty labels that fits within the original data range. If you don’t use this then an extra space is added at the edges even if you set xlim
,yaxs=”i”#Style – Same reason as X Axis
,main=”Total government spending \n(% of GDP by year)” #Got rid of ‘The shape of the beast’ for space constraints
,col=pal[1]
,ylim=c(30,70) #Setting the top Y axis limits to allow space for the legend
,lwd=4 #Quite unusually high line width but good for improving visibility in a small graph.
)

#Custom X axis
axis(side=1 #That’s the bottom X axis side
,at=Year[2:16] #labels starting from 1996; using at=Year places the labels at odd years.
,labels=substr(Year[2:16],3,4)) #Instead of using the full year, use just 2 digits.

grid(lwd=0.4,lty=1,col=”#000000″) #Very faint grid to guide the eyes.

#Add the rest of the lines
#France
lines(France~Year,data=gdp,col=pal[2],lwd=4)

#Germany
lines(Germany~Year,data=gdp,col=pal[3],lwd=4)

#Britain
lines(Britain~Year,data=gdp,col=pal[4],lwd=4)

#United States; Note we can’t use United States~Year because of the space between United and States. This calls for use of the data[["variable name"]] notation.
lines(gdp[["United States"]]~Year,data=gdp,col=pal[5],lwd=4)

#Lastly the legend
legend(“top” #Align it at the top in the center
,ncol=2 #Number of columns to spread the legend labels overs; 2 works best for our graph.
,legend=colnames(gdp)[2:6]
,lty=1
,lwd=4
,col=pal #Make sure to use the same colour palette as the graph lines!
,bg=”#FFFFFF” #White background to make it merge with the plot background.
,inset=0.01) #Inset the legend so that it doesn’t quite touch the border of the plot.

dev.off() #Close the graphics device