For those who are interested, a version of the code is available to see here.
I have gotten the software and hardware working together correctly.
I have gotten non-volume-weighted eternal inflation simulations working correctly on the graphics cards; the performance is stunning, about 500-1000x that of similar code on (one core of) a cpu! This is even better the O(100x) I hoped for looking at the number of processors on a card. The difference is presumably down to fast log/exp/trig operations on the card that the simulations require and the large memory bandwidth of the card. The program is generating 80 million history realizations of 128 time steps each in well under 10 seconds.
I am currently implementing volume-weighting into the code to look at first unconstrained and then later constrained volume-weighted histories.
While the code is in principle correct, I am experiencing overflow problems due to the sheer scale of expansion in appropriate models of eternal inflation. Double precision support would have been very helpful here! As is I am trying to subtract off the "classical" number of efolds before calculating the weighting factor for each chain but this is not yet working reliably.