Yanhua Sun, Gengbin Zheng, Chao Mei, Eric J. Bohm, Terry Jones, Laxmikant V.
Kalé, and James C. Phillips.
Optimizing fine-grained communication in a biomolecular simulation
application on Cray XK6.
In Proceedings of the 2012 ACM/IEEE Conference on
Supercomputing, pp. 1-11, Salt Lake City, Utah, 2012. IEEE press.
SUN2012-LK
Achieving good scaling for fine-grained communication intensive applications on modern
supercomputers remains challenging. In our previous work, we have shown that such an
application — NAMD — scales well on the full Jaguar XT5 without long-range interactions;
Yet, with them, the speedup falters beyond 64K cores. Although the new Gemini
interconnect on Cray XK6 has improved network performance, the challenges remain, and
are likely to remain for other such networks as well. We analyze communication
bottlenecks in NAMD and its CHARM++ runtime, using the Projections performance
analysis tool. Based on the analysis, we optimize the runtime, built on the uGNI library for
Gemini. We present several techniques to improve the fine-grained communication.
Consequently, the performance of running 92224-atom Apoa1 with GPUs on TitanDev is
improved by 36%. For 100-million-atom STMV, we improve upon the prior Jaguar XT5
result of 26 ms/step to 13 ms/step using 298,992 cores on Jaguar XK6.