PCI-Express

Upate April 2008: PCIe 2.0 cards and motherboards are now available, offering twice the data transfer rate between host and device.

Like basically all new graphics cards, the Nvidia ones suitable for CUDA connect to a PC motherboard via a PCI-Express connector. The PCI-Express interface is often denoted PCIe or something similar (but not PCI-X, which is something different). PCIe devices communicate over channels or "lanes". More active lanes between two devices imply faster data transfer between them. PCIe cards come that can use different maximum numbers of lanes, typically 1,4 or 16, and are thus described as x1,x4 or x16 cards respectively. The larger the number, the wider the connector on the card.

However, not all PCIe slots are the same, and this is where confusion may come in. Slots have two main properties:

Typical values for either are x1,x4,x8,x16. For example, a slot might be described as "x16 physical, x8 electrical". The physical size literally determines the size of the slot on the motherboard. You can physically plug a xN card into a xM slot if N<=M. The electrical connectivity determines the number of "lanes" actually wired up. Less lanes means less potential data transfer speed. However, a x16 device say, in a x16 physical slot, will still work fine even if the slot is only x1 or x8 electrical.

Now a typical motherboard might be capable of communicating over 20 or 40 actual lanes. A graphics card is x16. So the best we can expect to have in the latter case is 2 x16 electrical slots. But what about the remaining 8 lanes? If we are lucky they will go to a x16 physical slot. So we'll be able to fit 3 graphics cards in a machine, two of them communicating with the CPU (and potentially each other) at full speed, and one of them communicating at half speed (but still running at its native clock speed). A total of 3 graphics cards is very useful for a dual-CUDA machine; the third slot can be filled with a regular Nvidia graphics card that can take care of the screen.

A cheaper motherboard may have one x16 physical/electrical slot and a x4 one. Also possible is two x16 physical slots connected at x8 electrical. You can have difficulties trying to run CUDA applications and graphics on the same card, so one cheap starter option to get a motherboard with integrated Nvidia graphics but with a single x16 expansion slot, and use this slot to plug in the CUDA-capable card.

Notice that the graphics processor (whether integrated onto the motherboard or an actual card) used to display to the screen must also be an Nvidia one for everything to work properly. You can use (one of) the cards you intend to use for GPU computing to control the screen but then you run into run-time restrictions so this is not recommended.