Try Pretty Graph without signing up and buy it if you like it


I’ve been meaning to make this change for a long time and it’s finally here!

New users can now try out pretty graph without even having to sign up. Just go straight to https://app.prettygraph.com and you can give it a go without having to create yet another account for yet another website :-)

The no-signup free version of the application is limited in some ways. For example, you cannot download PDF versions of the graph or save your uploaded data files. If you wish to use these functionalities, you need to buy a Basic Account. Currently, you can buy the full-blown version in three flavours:

(a) try it out for a week just for $5

(b) buy a monthly subscription for $9 per month only (charged recurringly)

(c) buy an annual subscription for $49 per year (yes that’s less than 6 monthly subscriptions!)

If you are one of our early beta users, well firstly, thank you so much for your support. You get to use the full basic version of the application until the end of 2010 for free. If you still find it useful, you are welcome to purchase a subscription. Meanwhile, keep those feature requests and bug reports coming.

Please pass on the free trial link https://app.prettygraph.com to as many people as you think would find it useful.

Heatmaps added to Pretty Graph


After a lot of user requests, heatmaps have been finally added to Pretty Graph. You can now visualize multiple variables in a dataset using the heatmap option in our web-based data visualization application. An example is a correlation heatmap as shown below:

(Click to see the full graph with the scale)

Heatmaps are also a great way to visualize trends or hotspots in a large datasets, for example monthly sales data across various countries.

Currently, the heatmaps functionality works like this:

1. Load a data file and select heatmap as the graph type from the thumbnails on the right hand side control panel.

2. Choose a column containing row names (denoted by “Rows” in the control panel) and choose the columns of data you want to visualize.

3. Choose one of the many color schemes available (reds, blues, greens, purples etc. based on RColorBrewer).

4. The breakpoints between colors are chosen automatically (6 equally spaced values from min to max), but you can specify up to 10 custom breakpoints as comma separated values in the Breaks field.

Below is a screenshot of the heatmaps controls in Pretty Graph (click to enlarge):

What are you waiting for? Give it a try and tell us what you think. Please tell us what you like and dislike about it and what you would like us to add.

Some Ad Stats for Bing, Google and Yahoo


Found this story about Bing’s Growth on Hacker NewsBing Continues Growth, Ad Impressions & Clicks Way Up: Report.

The graphs in that are not very well done. Breaking up three trends for the same time period along the X axis makes no sense. I made the following improvements in Pretty Graph. Needs some more polish and boy I can’t wait to release the fully functional multiple layout and ability to save graph configurations.

Edit: I forgot to add – the bigger problem with the graphs is that they are showing percentage changes, NOT absolute numbers, which is why Bing looks so good.

Video: How to use Pretty Graph (the new updated UI)


Here’s a new video showing you how to use Pretty Graph. It features the new updated user interface with a lot more polish and new features. It’s a short one (under 3 minutes) and you will learn how to make a simple graph, download it as a PDF and email it too! Check it out and feel free to share.

How to use Pretty Graph from Pretty Graph on Vimeo.

Setting graph margins in R using the par() function and lots of cow milk


It is fairly straightforward to set the margins of a graph in R by calling the par() function with the mar (for margin!) argument. For example,

par(mar=c(5.1,4.1,4.1,2.1)

sets the bottom, left, top and right margins respectively of the plot region in number of lines of text.

Another way is by specifying the margins in inches using the mai argument:

par(mai=c(1.02,0.82,0.82,0.42))

The numbers used above are the default margin settings in R. You can verify this by firing up the R prompt and typing par(“mar”) or par(“mai”). You should get back a vector with the above values. The bottom, left and top margins are the largest because that’s where annotations and titles are most likely to be placed.

Since we can specify margins both in terms of lines of text and inches, let’s find out how high one line of text is by default:

par(“mai”)/par(“mar”)
[1] 0.2 0.2 0.2 0.2

0.2 inches!

There are ways to change this line height but that’s a useful number to keep in mind.

The default size of the figure region is approximately 7 inches wide by 7 inches high. You can verify this by typing par(“fin”) at the R prompt. So, by default the figure is 35 lines high and wide. One way to verify this is by trying to run the following code:

par(mar=c(35,35,0,0))
plot(1:10)

What happens? We get an error saying “figure margins too large”. That was bound to happen because we used up all of the figure region in margins and left no space for the plot to be drawn! You are probably never going to set such large margins, but in my experience errors like that occur when I’m working with multiple plot layouts (using the mfrow argument – I might write a post about that some time).

Margin lines are numbered starting from 0. We already know the number of margin lines from par(“mar”) but let’s make a graph to illustrate this point and see how the margin lines are numbered:

plot(1:10,ann=FALSE,type=”n”,xaxt=”n”,yaxt=”n”)
for(j in 1:4) for(i in 0:10) mtext(as.character(i),side=j,line=i)

In the above example, we used the mtext() function (which as the name suggests places text in the margins) to label the margin lines.

The mgp argument of the par() function is a vector of 3 values which specify the margin line for the axis title, axis labels and axis line. The default value of mgp is c(3,1,0), which means that the axis title is drawn in the fourth line of the margin starting from the plot region, the axis labels are drawn in the second line and the axis line itself is the first line.

Sometimes the axis labels may be very long and overlap with the axis title (for example, large numbers in scientific notation on the y axis). To overcome this we can use par() to first increase the left margin and then use mgp to set the axis title line. Note that using mgp applies the same set of margin values to axes on all four sides. Alternatively, we can suppress the drawing of the default axis label and use mtext() function specifying the line argument to a value higher than default. Let’s look at an interesting example to try this out.

Recently, there was a blog post showing some interesting data about the milk production powers of Wisconsin’s super-efficient cows. So, let’s pick up that data and plot the total milk production for the last few decades.


cows<-read.csv("http://public.tableausoftware.com/vizql028/export/sessions/d87316bf-0:11/views/Dairycowsandmilkproduction19242009_156022745?format=text/csv&")

#This bit of code is to remove the commas in numerical fields in the original dataset. I don't know of any automatic ways to do this in R.
for (i in 1:ncol(cows)){
if(length(grep(",",cows[[i]]))>0)
cows[[i]] <- as.numeric(gsub(",", "", cows[[i]]))
}

plot(cows$Total.milk.production~cows$Year,las=1,xlab="Year", ylab="Total Milk Production (in pounds?)")

Now, you may argue that showing cow milk production in scientific (E) notation is a bit too nerdy, but I think it’s a good enough example here. The Y axis title overlaps the axis labels, making the graph hard to read and a bit ugly. So, let’s fix it by increasing the left margin using mar and placing the axis title in a higher margin line using mgp:


par(mar=c(5,6,4,2)+0.1,mgp=c(5,1,0))
plot(cows$Total.milk.production~cows$Year,las=1,xlab=”Year”, ylab=”Total Milk Production (in pounds?)”)

That looks better, but did you notice we knocked out the X axis title? That happened because as I wrote earlier, mgp applies to both the axes. So we asked par to place the axes titles in line number 5. Since line numbering starts at 0, that’s the sixth line in the margin. But we only left 5 lines worth of margin space on the bottom X axis. So the X axis title did not fit within the figure.

To get around this problem, there are at least three solutions. Let’s first look at the hardest one.


par(mar=c(5,6,4,2)+0.1)
plot(cows$Total.milk.production~cows$Year,las=1,xlab=”Year”, ylab=”")
mtext(“Total Milk Production (in pounds?)”,side=2,line=5)

Voila! So we dropped the mgp argument, set the left margin wide, suppressed the default Y axis label and then used mtext to place the title in line 5. Thus, we used the defaults for the X axis title and used a custom function call for the Y axis title.

What are the easy solutions then? Just use lattice or ggplot2 – they will take care of the margins automatically almost in all cases without you having to worry about it.

If you are wondering why I am wrestling with these base graphics settings, there is a good reason. I’m building a web-based graphing application using R, so I need to automatically and quickly create good looking graphs for a variety of use cases. In my experience, the base graphics functions are faster than using lattice and ggplot2 simply because loading the packages takes a few too many seconds. In building the code for Pretty Graph to handle all sorts of user input data and help people visualise them as different types of graphs, I am having to hack around the base R code to make it produce good graphs consistently. I think that one can make very good graphs using the basic functions if one spends some time learning the different parameters. This is the first in a series of blog posts where I talk about my experience in building R graphs and some interesting quirks of R graphics functions. I hope it will be a good learning experience for me.

Noise Visualization in the Tenderloin area of San Francisco by Movity


A San Francisco based startup Movity has created the following beautiful visualisation of noise in the city’s Tenderloin area.  Tendernoise is an applied acoustic ecology project, which aims to show the effect of noise on the quality of life.

The image above is just a snapshot of the animation showing the noise patterns over two and a half days. Most of the times, the noise levels are around 60-80 dB. The most striking thing is that there is never a quiet moment!

The lengths of the coloured lines in the bottom panel denote the range between the highest and lowest db reading at that given moment.

PS. Movity is funded by YCombinator. I found this on Hacker News.

Legends now automatically added to scatter plots and line graphs


If you make a scatter plot or line graph with more than one variable on the Y axis, Pretty Graph automatically adds a legend to the right. Here’s an example of a line graph showing monthly rainfall in some cities:

Next, we will be adding options to choose the location of the legend (top, right, bottom).

Lots more little improvements are in the works.

If you have any comments, suggestions or questions please leave a comment below or email me at hrishi@prettygraph.com.

Don’t forget to sign up for the private beta. It’s free!

An update on Pretty Graph


The application is still fairly basic but you can do a bunch of things with it. You can create five different types of graphs: scatterplots, line graphs, bar plots, histograms and boxplots.
Over the last few days we have added some additional functionality such as:
  • plotting more than one variable on the same Y axis
  • making a multi-plot layout i.e. one image with multiple graphs
  • automatically adding legends to plots with multiple data sets
  • downloading the graphs as image (PNG) or PDF
  • sharing graphs by email
  • user data file management
We are now frequently updating the application (almost daily) and soon you will be able to see a lot more useful functionality added, some of which includes:
  • high resolution image downloads in PNG and other formats for use in publications.
  • new graph type: heatmaps.
  • graph style templates for quickly choosing a suitable format
  • batch processing to automatically produce the same type of graph for a number of data files
In addition to adding useful features to help you make better graphs more easily, we will also be producing more help material such as screencasts and tutorials to get you started with Pretty Graph properly.
We will also open up the application to everyone soon, so that we can test it out well in the wild before the paid subscription version goes live.
Stay tuned for more!

Response to Flowingdata Challenge: Graphing obesity trends


Nathan at Flowingata put up another interesting challenge today to improve the following graphic showing obesity trends in America.

Here’s my attempt:

I transposed the data so that the cohorts are on the X axis and each separate line represents an age group. So each line shows the percentage of obese people in a particular age group. This way the graph tells you the probability of you being obese at a given age in a particular decade.

For each age group, the line roughly trends upwards. For example, the lavender (violet? 3rd from bottom) line shows that if you were in your twenties during the World War II chances of you being obese were just below 10%, but if you were in your twenties during the mid 60s-70s (rocking out to The Doors?) you were more than twice as likely to be obese.

So, from a cursory look, it does seem that Americans have been getting obese faster.

For those interested, here’s my modified data file and the R code:

o<-read.csv(“obesity_edit.csv”)

colnames(o)<-c(“Age group”,”2-9 Y”,”10-19 Y”,”20-29 Y”,”30-39 Y”,”40-49 Y”,”50-59 Y”,”60-69 Y”,”70-79 Y”)

library(RColorBrewer)
pal<-brewer.pal(length(colnames(o)),”Set1″)

plot(o[,2],pch=19,xaxt=”n”,col=pal[2],type=”o”,ylim=c(0,max(o[,-1],na.rm=T)),xlab=”Cohort by Decade”,ylab=”Percentage of Obese People”,main=”Obesity trends by Age Group”)

for(i in 3:length(colnames(o))) {
points(o[,i],pch=19,xaxt=”n”,col=pal[i])
lines(o[,i],pch=19,xaxt=”n”,col=pal[i])
}

axis(1,at=1:length(o[,1]),labels=o[,1],cex.axis=0.75)

legend(“topright”,legend=colnames(o)[-1],col=pal[-1],lty=1,pch=19,bty=”n”)

Airspace Rebooted: A visualisation of the northern European airspace returning to use


ItoWorld has created this beautiful visualisation of the European airspace coming back to life after the Icelandic volcano eruption last week.

Airspace Rebooted from ItoWorld on Vimeo.