wallclock and cputime

From: Marc Q. Ma (qma_at_oak.njit.edu)
Date: Thu Sep 01 2005 - 16:31:01 CDT

Dear NAMD community,

We are running some benchmarking tests on an AMD Opteron 64 bit
architecture multinode linux cluster. Our system contains 76,000 atoms.

However, we found that the wallclock is always significantly greater
than the CPU time, for example:

wallclock: 15:40, CPU time:471.9 (2 nodes, ppn=1)

wallclock: 22:36, CPU time:273.8 (2 nodes, ppn=2)

wallclock: 23:27, CPU time 155.7 (4 nodes, ppn=2)

Wallclock:3:36:29, CPU time 178.5 (4 nodes, ppn=2)

Even there was a job with wallclock: 10:22:07, CPU time 102.5!

Why would our cluster do the above? CPU time is important, however,
it's wallclock time that makes a difference! If we have used more
processors, we expect to use less wallclock time to finish our jobs --
it is so simple. We can not afford to wait for 3 hours and a half time
just to use CPU time of 3 minutes!

For single processor mode, the sequential jobs show correct
walltime-CPUtime, eg: Walltime: 14:26, CPU time 879.8, which are about
the same.

In terms of serial performance, the hydra nodes are much better than a
SunBlade2000 machine (dual 1.2GHz, 2GB mem): same job on this SunBlade
machine will require 4.33 times computing time.

Can someone do some detective work on this matter? I suspect there is
something fishy about our hardware. Or is the problem from charmrun, or
namd, or the Myrinet interconnect?

Thanks for your input.

Marc

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:41:06 CST