Dave's Matplotlib Basic Examples

The Matplotlib home page is the place to start for help. Click on thumbnail gallery and scan for something similar to what you want, then click on that for details of how to do it. However, I'm building up my own list of simplified examples because (1) those examples tend to be somewhat more complicated than pedagogically necessary; and (2) sometimes it's hard to match my question or need to one of those examples (sometimes just because of the terminology I bring to it, having grown up using other packages), so I want to archive some of my hard-won knowledge.

Scatterplots and spatial binning

The upper two subplots below show scatterplots in which I wanted to represent some third quantity (in addition to x and y location) by either a color (left, scatter(ra,dec,c=mag,cmap=cm.hsv)) or a size (right, scatter(ra,dec,s=(20-mag)*3)). (The variable names came about because the data is a list of stars, where ra and dec represent position on the sky and mag represents brightness.)

The lower two subplots show the easiest way to do spatial binning, with hexbin. The lower left plot shows just the number of data points in each hexagonal cell (zero for most cells, hence the monotonous red), using

hexbin(ra,dec,cmap=cm.hsv) 
colorbar()

The lower right plot shows how to average some other quantity in the hexagonal cells, with

hexbin(ra,dec,C=mag,gridsize=25,cmap=cm.hsv) 
colorbar()

The number of hex cells is now reduced from the default 100, so that I sometimes get more than one data point in a cell! The absence of data is shown at white; I probably should have used a denser data set for this example. If you don't like hexagonal bins, scroll down to "More Spatial Binning" to see an alternative, which takes more than one line of code.

To do outlier rejection, try plotting the median rather than the mean of the quantities in each cell: hexbin(ra,dec,C=mag,gridsize=25,cmap=cm.hsv,reduce_C_function=numpy.median))

Example data for making these plots. If you want to reproduce these plots with the commands above, first do

data = numpy.loadtxt('example.fiat')
ra = data[:,3]
dec = data[:,4] 
mag = data[:,5]

(after your normal import statements, of course). Note: on very old installations of matplotlib, the hexbin function does not exist. If you need it, update your system!

Fiddling with the points to make scatterplots look better

If you have LOTS of points on a scatterplot, they start to overlap. You can shrink them with the option s=5 or so, but what happens is that each marker's colored area shrinks while retaining a black border of a certain linewidth, thus giving you almost nothing but black border in the limit of very small markers. You can get rid of the black borders entirely with the edgecolors='none' option (edgecolors may be specific to the pyplot flavor of matplotlib; try markeredgecolor if edgecolors doesn't work). This allows you to shrink markers as much as you want without having the border dominate the appearance. However, it looks bad when the markers are TOO small.

The way I prefer to accommodate crowded scatterplots, which I inherited from sm, the plotting package I used prior to matplotlib, is to make points "open," that is, they have a perimeter but nothing in the center so that it is easy to see exactly how markers overlap. In matplotlib this is accomplished with

plot(v1, v2, 'o', markerfacecolor='None')

Before a reader told me about this solution, my workaround was to keep the markers big enough to overlap, but make them partially transparent with the option alpha=0.2 or so.

In summary, you can separately control the color of the face and the perimeter of the marker, including by specifying None for either color. This was not obvious to me just by looking at the example gallery.

A challenge for matplotlib experts out there: how do I make a plot which is a scatterplot when the density of points is low, but transitions to a contour plot when the density of points is high? I've seen these types of plot prepared by other graphics systems, and they seem to me to be the optimal way to represent density when both high and low density regions are interesting.

More spatial binning

The hexbin function used above is the easiest out-of-the-box solution, but sometimes you just really want square bins, or maybe you need some tweak that hexbin doesn't provide. In principle, it's easy to make square bins yourself in two steps, with numpy's histogram2d and then matplotlib's imshow. But you have to be careful with the details, because (1) imshow likes to put the origin in the upper left (the computer graphics convention), (2) imshow also uses a different convention for which index (x or y) changes more rapidly, leading to transposition of the image, and (3) imshow by itself would label the axes with pixel coordinates, NOT the coordinates of your bins.

Here is the simplest test case. It generates Gaussian distributions with different offsets and variances in x and y so you can verify that it's really doing the right thing. Notice the use of the transpose, hist.T.

from pylab import *
import numpy

# the x distribution will be centered at -1, the y distro
# at +1 with twice the width.
x = numpy.random.randn(3000)-1
y = numpy.random.randn(3000)*2+1

hist,xedges,yedges = numpy.histogram2d(x,y,bins=40,range=[[-6,4],[-4,6]])
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1] ]
imshow(hist.T,extent=extent,interpolation='nearest',origin='lower')
colorbar()
show()

On my machine at least, it produces this image:

Colormaps

A list of colormap choices is here. On your own system, you can use this to see all the choices:

maps=[m for m in cm.datad if not m.endswith("_r")]
print maps

Each (some?) map name also has a _r version to reverse it.

Subplots

The first plot on this page has four subplots, generated by subplot(221) which means "split the page into a 2x2 array of subplots and put me in the first (upper left) one", subplot(222) which means "put me in the second (upper right) one", subplot(223) which means "put me in the third (lower left) one", etc. But sometimes you want to squeeze the plots together so they can share axes. For example, if they can share both vertical and horizontal axes, use subplots_adjust(wspace=0,hspace=0). But this doesn't do what you would like with the ticklabels, namely get rid of them for half of the subplot axes, like this:

To do this, you need to also set the tick formatter on the relevant axes to null. For example, for the upper right subplot you want to do it on both axes:

from matplotlib.ticker import NullFormatter
...
ax = subplot(222)
ax.xaxis.set_major_formatter( NullFormatter() )
ax.yaxis.set_major_formatter( NullFormatter() )

You may also need to adjust the locations of the labels. In the above example, matplotlib wanted to put labels at every 0.025 on the x axes, so the +0.025 in one panel collided with the -0.025 in the adjacent panel. So I used

ax.xaxis.set_major_locator(MultipleLocator(0.01))

To make it clear that the label "exposure residual (mag)" applies to all subplots, I used figtext which places text in overall figure coordinates (0-1 in each direction, from lower left) rather than xlabel, which places it relative to your current subplot.

Note that this plot also makes use of the alpha transparency option discussed briefly above. You can use alpha with lots of routines.

To see how all these bits fit together, read the full script used to make the above plot.

Legends, labels, decorations

The standard x and y labels are set using xlabel(string) and ylabel(string). However, by default the labels are really small. They are readable on the screen, but basically unreadable after the figure is shrunk into a journal page. So use:

xlabel(string,fontdict={'fontsize':20})

Here is a comparison of a shrunken figure with and without this optional argument:

Even 20-point font may not be big enough for the y label in this case! There are ways of changing your default font size, but personally I think the numeric tick labels should be in a smaller font than the name of the variable.

Fiddling with which ticks are labeled: sometimes the default choice of which ticks are labeled is awkward, for example when labels at the corners run into each other. You can control this with:

from matplotlib.ticker import MultipleLocator
...
majorLocator = MultipleLocator(0.01) 
ax=subplot(223)
ax.xaxis.set_major_locator( majorLocator )

This puts a label every 0.01 units as in the exposure residual example above, in the Subplots section. In that plot, for consistency I used the same tick locator in multiple subplots.

Math symbols in labels: everyone knows that you can put TeX in the labels, but it was not obvious to me how to quote a string to make this happen correctly. Here's how:

ylabel(r'$D_l D_{ls}/D_s\ (Mpc)$',fontdict={'fontsize':20})

This is exactly the code used in the above plot.

Legends: these are nice when plotting multiple curves, and are set up to work basically automatically. Here is the result of

for i in range(len(priors)):
    plot(z,y[i,:],label='%d%%' % (priors[i]*100))
legend(title='Prior on lens mass')

(Read the whole script if necessary.) A few things to note here:

Each plot command needs to have the optional label argument to build up the information necessary for the legend. Note that by itself adding label to the plot command does nothing! (Similarly, legend() is useless unless you've set the labels.)
The legend is built up in the same order in which you invoked the plot commands. Because the curve corresponding to the 10% prior is on top (followed by successively smaller priors) in the actual data, I wanted the same order in the legend, and I was careful to plot() them in that order. You could also set the order by explicitly giving the order to legend(); see the manual for how to do that.
The title is optional for the legend. If your labels are descriptive enough, you just don't need it.
As always in Python, "%%" is necessary to encode a percent symbol.

Manually drawn lines: often you want to manually draw some lines on the plot, perhaps to indicate where you would impose a cut on your data, or what a certain power law would look like. Use

  
plot((x1,x2),(y1,y2),'k-')

This still doesn't work as nicely as sm's draw command, because it makes matplotlib think these are datapoints, and thus expands the plot if the points you give are outside the range of the other (real) data. So you have to pay more attention to the points you give. Being lazy, I would like a command that says "draw a line from this point to that point, but still use the range of real data to determine the cropping."