Screencast: Introduction to Pretty Graph


This is a quick demonstration to get you started with the beta version of Pretty Graph. It’s best to watch it in full screen mode so that you can see all the details.

Sign up now to make your own graphs! Just click on the red button below and you will be on your way.



How to make a Scatter Plot using matplotlib in Python


What is matplotlib?

matplotlib is a python based object oriented plotting library used for data visualization. It produces publication quality plots in various formats.

Graphically plotting of data is a very important step in statistical data analysis. Scatter plots help in visualizing the correlation of variables. The relationship of the variables can be best displayed with scatter plots.

This article gives you a head start in plotting a scatter plot using matplotlib and python.

Following is a scatter plot generated using matplotlib:Source : National Geographic

Python Code

#importing the required libraries
import matplotlib
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
import matplotlib.mlab as mlab

# Read data from a CSV file. Click here to download.
r = mlab.csv2rec(‘HealthExpenditure.csv’)

# Create a figure with size 6 x 6 inches.
fig = Figure(figsize=(6,6))

# Create a canvas and add the figure to it.
canvas = FigureCanvas(fig)

# Create a subplot.
ax = fig.add_subplot(111)

# Set the title.
ax.set_title(“Health Expenditure Across The World”,fontsize=14)

# Set the X Axis label.
ax.set_xlabel(“Expenditure per person (US Dollars)”,fontsize=12)

# Set the Y Axis label.
ax.set_ylabel(“Average Life Expectancy at Birth (Years)”,fontsize=12)

# Display Grid.
ax.grid(True,linestyle=’-',color=’0.75′)

# Generate the Scatter Plot.
ax.scatter(r.expenditure,r.life_expectancy,s=20,color=’tomato’);

# Save the generated Scatter Plot to a PNG file.
canvas.print_figure(‘healthvsexpense.png’,dpi=500)

Step by Step

r = mlab.csv2rec(‘HealthExpenditure.csv’)
mlab consists of functions that are written to be compatible with MATLAB commands having the same names. Here, we use the function csv2rec() to get data from a csv file in a record array format.

fig = Figure(figsize=(6,6))
Figure is a container that is used to define the perimeter of the plots. A Figure could contain multiple Axes. The figsize attribute defines the dimensions of the figure. The format of figsize is figsize=(w,h) where w is the width and h is the height of the figure. The unit of measurement for the dimensions is inches.

canvas = FigureCanvas(fig)
FigureCanvas is a container to hold the Figure instance. The primary purpose of a FigureCanvas is to render the figure.

ax = fig.add_subplot(111)
As mentioned earlier a figure could have multiple plots called subplots. To add a subplot to a figure, the add_subplot() is used. The parameter 111 specifies 1 row, 1 column of subplot #1.

ax.set_title(“Health Expenditure Across The World”,fontsize=14)
ax.set_xlabel(“Expenditure per person (US Dollars)”,fontsize=12)
ax.set_ylabel(“Average Life Expectancy at Birth (Years)”,fontsize=12)
The above three statements are used set the text for the title, x axis label and y axis label respectively. The text can be formatted with attributes available in the matplotlib.text class.

ax.grid(True,linestyle=’-',color=’0.75′)
The grid() function is used to set the horizontal and vertical gridlines on the plot. Alternatively you could set only the horizontal or vertical gridlines using ax.yaxis.grid() or ax.xaxis.grid().

ax.scatter(r.expenditure,r.life_expectancy,s=20,color=’tomato’);
The function scatter() is the main function that plots the Scatter Plot. In the example the x axis isĀ  r.expenditure, yaxis is r.life_expectancy, s is the size of the point and color is the color of the point. For a complete list of paramters for scatter()refer to scatter().

canvas.print_figure(‘healthvsexpense.png’,dpi=500)
The print_figure function of the FigureCanvas class is used to generate an image file of the plot. The above statement will generate a .PNG file with a resolution of 500 dots per inch.

My Development Setup
Ubuntu 9.04
Python 2.6.2
matplotlib 0.98.5.2-1ubuntu3

References

The official matplotlib site

My Development Setup
Ubuntu 9.04
Python 2.6.2
matplotlib 0.98.5.2-1ubuntu3

Responding to the Flowingdata GDP Graph Challenge


Nathan Yau of Flowingdata put up a challenge earlier today to improve upon a graph showing government spending as a percentage of GDP, published in the Economist.

The underlying data wasn’t available. So I put on my graph-to-numbers glasses on and pulled out some data. Here it is in case you want to have a go.

I took on the first part of the challenge i.e. Can you think of a way to make this graph easier to read?

The Original Graph from the Economist:

Total government spending as percentage of GDP

I hacked up the following version in R. It was a bit of a challenge to get it right given the constraints of fitting all the data and legend within a 290 x 300 image.

Total government spending as percentage of GDP

Do you think this is an improvement? Leave a comment below or in Nathan’s original post at http://flowingdata.com/2010/02/25/challenge-make-this-graph-easier-to-read/.

And here’s the R code:

#Read the file
gdp<-read.delim(“gdp_long.txt”,header=T)

#Reset the column name from United.States to United States; R replaces spaces in variable names with dots; you’ll see why below.
colnames(gdp)[6]<-”United States”

#Define our colour palette so that we can edit it in one place and refer to colours by index as shown below
pal=c(“black”,”darkorange”,”blue”,”forestgreen”,”tomato”)

#Start PNG device with the given constraints of 300×290 (boy that’s a small image!)
png(“gdp.png”,height=300,width=290,units=”px”)

#Plot settings
par(mar=c(2,2,3,1) #Small images call for small margins
,las=1) #For some reason, the default orientation (las=0) of the axis labels is parallel to the axis. This works OK for the X axis but makes it hard to read Y axis labels, so set to horizontal.

#Finally, the main plot command
plot(Canada~Year,data=gdp
,type=”l”
,xaxt=”n” #Don’t draw default X axis; we’ll draw a custom one below.
,xaxs=”i” #X axis style (internal) just finds an axis with pretty labels that fits within the original data range. If you don’t use this then an extra space is added at the edges even if you set xlim
,yaxs=”i”#Style – Same reason as X Axis
,main=”Total government spending \n(% of GDP by year)” #Got rid of ‘The shape of the beast’ for space constraints
,col=pal[1]
,ylim=c(30,70) #Setting the top Y axis limits to allow space for the legend
,lwd=4 #Quite unusually high line width but good for improving visibility in a small graph.
)

#Custom X axis
axis(side=1 #That’s the bottom X axis side
,at=Year[2:16] #labels starting from 1996; using at=Year places the labels at odd years.
,labels=substr(Year[2:16],3,4)) #Instead of using the full year, use just 2 digits.

grid(lwd=0.4,lty=1,col=”#000000″) #Very faint grid to guide the eyes.

#Add the rest of the lines
#France
lines(France~Year,data=gdp,col=pal[2],lwd=4)

#Germany
lines(Germany~Year,data=gdp,col=pal[3],lwd=4)

#Britain
lines(Britain~Year,data=gdp,col=pal[4],lwd=4)

#United States; Note we can’t use United States~Year because of the space between United and States. This calls for use of the data[["variable name"]] notation.
lines(gdp[["United States"]]~Year,data=gdp,col=pal[5],lwd=4)

#Lastly the legend
legend(“top” #Align it at the top in the center
,ncol=2 #Number of columns to spread the legend labels overs; 2 works best for our graph.
,legend=colnames(gdp)[2:6]
,lty=1
,lwd=4
,col=pal #Make sure to use the same colour palette as the graph lines!
,bg=”#FFFFFF” #White background to make it merge with the plot background.
,inset=0.01) #Inset the legend so that it doesn’t quite touch the border of the plot.

dev.off() #Close the graphics device

Introducing Pretty Graph


EDIT: This post has been updated to reflect the change in the signup process.

Pretty Graph is a web-based software for making graphs and charts quickly and easily.

You can try it, just register with your email address and we will provide you a free beta account.

You can start by making scatterplots, but we’ll be adding line charts, bar graphs, stock charts, pie charts, histograms, heatmaps and many more chart types very soon. Please leave a comment to tell us what you’d like included or any other problems you have in making pretty pictures out of your data.

So try it now for free! Make a graph, put it on your website or email it to a friend.