All the necessary source code is included in the mcs tarball. Note that if you are compiling the reference implementation with gcc >= 3.4, you must include the -fpermissive flag, as nonstandard array initialization extensions are used. Your GPU implementation should be able to parallelize both the conformation generation and the scoring; it is recommended that for best performance you have multiple thread blocks each attempting a different move set independently (this involves conformation generation and scoring). In this way, if one trial conformation fails the Metropolis test, another one may pass it. Note that you will need to throw out all of the other generated conformations once one of them passes. A large portion of the work involved in this project will be in adapting the energy calculations to run efficiently in parallel, and properly balancing between the number of conformations being tried at any given time and the way energies are calculated for each. Bonus respect points (not redeemable for real points) will be awarded for running multiple Monte Carlo chains in parallel with each other; while not strictly true to the original algorithm, such an approach may turn out to be the best use of the GPU's power for this problem, and would allow for much broader sampling in a given amount of time.