Which card to buy?

Note: The information below was current as of late 2007.

I spent a fair bit of time in deciding which card to buy. I'll try and summarize the situation as I now see it.

Nvidia have three "brands" related to graphics card computing: Quadro, Tesla and GeForce. Quadro are their high-end video cards for professional workstations, Tesla is a new brand solely for computing (the normal cards in this range don't even have monitor outputs on them so can't be used for graphics), and GeForce is their mainstream gaming brand. Although there might a factor of five or ten in cost for the higher end cards throughout the range, for GPGPU purposes there is little difference in performance at the moment. The main advantage of Quadro cards is the larger on-card memory (up to 1.5GB as opposed to up to 0.75GB). As well as regular cards, the Tesla range also comprises multi-card solutions including a rackmount part.

In the future the Quadro and Tesla cards might be significantly more attractive for computing because it is likely that they will be the only ones that will support double precision floating point arithmetic.

Graphics cards are often grouped according the central processor family they sport. The current offering feature G80 series processors. Only G80 and above processors support CUDA. This corresponds to the "GeForce 8" range and above for the GeForce cards, and I'd suggest checking carefully on the Quadros (certainly FX4600 and FX5600 support CUDA). The key numbers to look for in comparing different G80 series cards are:

(Note that there a multiple clock frequencies that can be quoted for any card: its core clock, its shader clock, and its memory clock. The latter two are the most important for CUDA.)

In the GeForce range of cards the 8800 Ultra is a clear winner but it is of course the most expensive and requires the most power.

One point to note is that the more recent cards (8400, 8500 and 8600) support certain atomic instructions that the original high-end cards don't. These instructions allow one to access global memory in a controlled manner and are useful for simplifying coding certain programs. Also, these cards are only one slot wide, so in principle one might be able to fit more of them onto a given host.

With an eye on size/power requirements, I went for two of the 640 MB 8800 GTS cards. (Note that the GTS cards come in two memory sizes, 320 MB and 640 MB; I've been very glad that I chose the larger ones for experimenting with large matrix inversion.) Their combined cost was about the same as one Ultra card, and on paper at least their combined performance is higher then one Ultra. However, it is rather difficult to write programs that use two cards together. One has to write an explicitly multithreaded host program, using (in linux) the pthread library, rather than openmp or mpi, and any communication between the cards currently has to go via the host computer thread rather than directly across the PCIe bus. I think one can use mpi to spread jobs through clusters in which each node has a graphics card attached but I have not tried this myself. I plan eventually to experiment with pthreads (and mpi on one machine) to get both cards working together on a problem and to see how performance scales.

NVIDIA have recently released the mid/high-end 8800 GT card, supported by CUDA toolkit 1.1. Being only a single slot wide and relatively cheap, this might be an ideal starting choice, when supported. Alternatively, for the brave/phtread-enabled, it should be possible to fit up to four of these in a suitable machine! There are versions with up to 512 MB of memory. NVIDIA have also released a new card with the 8800 GTS label, the 8800 GTS 512Mb, which in fact is more like a souped-up GT than an original GT.

I chose cards by Inno3D that were in fact "factory-overclocked", with a core frequency of 570 MHz rather than 500 MHz and a memory speed of 1800 MHz rather than 1600 MHz. (Unfortunately the key shader clock speed remained at 1200 MHz and was not boosted: the promotional literature for the card did not say anything either way about this so I was hoping for something there too maybe.) They have performed flawlessly and it has been useful being able to have one card at home and one at work. Here a picture of one of the cards: