timing variation during 256core run on abe.ncsa.uiuc.edu

From: Thomas C. Bishop (bishop_at_tulane.edu)
Date: Mon Feb 21 2011 - 13:52:56 CST

dear namd,

I'm running a 256CPU NAMD 2.7 Linux-x86_64-ibverbs
simulation on abe at ncsa. The simulation contains 206031atoms.

I've run many many simulations with the same namd configuration and consistently
get benchmarks of
Info: Benchmark time: 256 CPUs 0.0149822 s/step 0.0867028 days/ns 337.961 MB memory
Info: Benchmark time: 256 CPUs 0.0155425 s/step 0.0899449 days/ns 349.344 MB memory
Info: Benchmark time: 256 CPUs 0.0148334 s/step 0.0858417 days/ns 351.711 MB memory

However every now and then namd slows to a crawl (factor 30 change speed) during run time (see times below)
The simulations themselves are not crashing (i.e. alll energies and the trajectory itself look good).
Interestingly the simulation speed recovers and is able to return to benchmark again.

Is this symptomatic of a hardware, I/O problem , scheduling/load conflict, that I should bring up w/ sys-admins on ABE?

I'd chalk this up to system load but this shouldn't happen in a batch env, or am I missing somethere here?

The complete 35M namd output file is available at
http://dna.ccs.tulane.edu/~bishop/dyn11.out

Thanks for any info.
Tom

TIMING: 5000 CPU: 355.724, 0.071117/step Wall: 357.868, 0.071539/step, 9.83661 hours remaining, 351.710938 MB of memory in use.
TIMING: 10000 CPU: 428.965, 0.0146482/step Wall: 432.858, 0.0149981/step, 2.0414 hours remaining, 351.710938 MB of memory in use.
TIMING: 15000 CPU: 502.386, 0.0146842/step Wall: 507.764, 0.0149811/step, 2.01829 hours remaining, 351.710938 MB of memory in use.
TIMING: 20000 CPU: 917.611, 0.083045/step Wall: 925.339, 0.083515/step, 11.1353 hours remaining, 351.710938 MB of memory in use.
TIMING: 25000 CPU: 994.495, 0.0153769/step Wall: 1006.21, 0.0161739/step, 2.13405 hours remaining, 351.710938 MB of memory in use.
TIMING: 30000 CPU: 1067.95, 0.0146908/step Wall: 1083.06, 0.0153707/step, 2.00673 hours remaining, 351.710938 MB of memory in use.
TIMING: 35000 CPU: 1676.58, 0.121726/step Wall: 1692.43, 0.121873/step, 15.7419 hours remaining, 351.710938 MB of memory in use.
TIMING: 40000 CPU: 2224.61, 0.109606/step Wall: 2242.1, 0.109935/step, 14.0472 hours remaining, 351.710938 MB of memory in use.
TIMING: 45000 CPU: 2752.26, 0.105531/step Wall: 2772.49, 0.106078/step, 13.4071 hours remaining, 351.710938 MB of memory in use.
TIMING: 50000 CPU: 3329.5, 0.115446/step Wall: 3351.42, 0.115786/step, 14.4733 hours remaining, 351.710938 MB of memory in use.
TIMING: 55000 CPU: 4428.86, 0.219874/step Wall: 4452.69, 0.220253/step, 27.2258 hours remaining, 351.710938 MB of memory in use.
TIMING: 60000 CPU: 5495.78, 0.213383/step Wall: 5521.81, 0.213824/step, 26.134 hours remaining, 351.710938 MB of memory in use.
TIMING: 65000 CPU: 7152.37, 0.331318/step Wall: 7180.03, 0.331644/step, 40.0736 hours remaining, 351.710938 MB of memory in use.
TIMING: 70000 CPU: 9351.38, 0.439802/step Wall: 9380.11, 0.440017/step, 52.5575 hours remaining, 351.710938 MB of memory in use.
TIMING: 75000 CPU: 10993.4, 0.328407/step Wall: 11024.3, 0.328832/step, 38.8205 hours remaining, 351.710938 MB of memory in use.
TIMING: 80000 CPU: 11066.3, 0.0145752/step Wall: 11098.6, 0.0148747/step, 1.73539 hours remaining, 351.710938 MB of memory in use.
TIMING: 85000 CPU: 13187.3, 0.424192/step Wall: 13222.4, 0.424758/step, 48.9651 hours remaining, 351.710938 MB of memory in use.
TIMING: 90000 CPU: 14291.9, 0.220935/step Wall: 14329.3, 0.221371/step, 25.2117 hours remaining, 351.710938 MB of memory in use.
TIMING: 95000 CPU: 15932.4, 0.328092/step Wall: 15971.4, 0.328431/step, 36.9484 hours remaining, 351.710938 MB of memory in use.
TIMING: 100000 CPU: 16479.4, 0.109409/step Wall: 16519.7, 0.109659/step, 12.1843 hours remaining, 358.207031 MB of memory in use.

 

*******************************
   Thomas C. Bishop
    Tel: 504-862-3370
    Fax: 504-862-8392
http://dna.ccs.tulane.edu
********************************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:40 CST