50% system CPU usage when parallel running NAMD on Rocks cluster

From: (malrot13_at_gmail.com)
Date: Sat Nov 30 2013 - 21:06:18 CST

Dear all,
Im tuning NAMD performance on a 7 compute node Rocks cluster. The problems
is when running NAMD (100,000 atoms) with 32 cores (on 2 nodes) the system
CPU usage is about 50%. Increasing cores (48 cores, on 3 nodes) will
increase system CPU usage and decrease speed.
The detail information of one compute node shows below:
CPU: 2 * Inter Xeon E5-2670 (8Cores/ 2.6GHz)
Mem: 64G (1600)
HardDrive: 300G (15000)
Network card: Intel Gigabit Ethernet Network Connection
Switch: 3Com Switch 2824 3C16479 (24-port unmanaged gigabit)(a pretty old
switch :| )

Compiling & running :
Charm-6.4.0 was build with ./build charm++ mpi-linux-x86_64 mpicxx -j16
 --with-production options. Some error was ignored when compiling it. For
example:
Fatal Error by charmc in directory
/apps/apps/namd/2.9/charm-6.4.0/mpi-linux-x86_64-mpicxx/tmp
   Command mpif90 -auto -fPIC -I../bin/../include -O -c pup_f.f90 -o
pup_f.o returned error code 1
charmc exiting....
NAMD was compiled with Linux-86_64-g++ option. Some warning was showed when
compiling NAMD.
Openmpi (from HPC roll of Rocks) was used to run namd. The command is:
mpirun -np {number of cores} -machinefile hosts
/apps/apps/namd/2.9/Linux-x86_64-g++/namd2 {configuration file} > {output
file}
SGE(Sun Grid Engine) was also used. The job submitting command is:
qsub pe orte {number of cores} {job submitting script}
Job submitting script contains:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
/opt/openmpi/bin/mpirun /apps/apps/namd/2.9/Linux-x86_64-g++/namd2
{configuration file} > {output file}

Performance test:
Test system contains about 100,000 atoms. Running (using mpirun) on 1 node
with 16 cores, I got the following benchmark data:
1 node, 16cores:
Info: Benchmark time: 16 CPUs 0.123755 s/step 0.716176 days/ns 230.922 MB
memory
Info: Benchmark time: 16 CPUs 0.121637 s/step 0.703919 days/ns 230.922 MB
memory
Info: Benchmark time: 16 CPUs 0.122744 s/step 0.710327 days/ns 230.953 MB
memory
CPU usage:
Tasks: 344 total, 17 running, 327 sleeping, 0 stopped, 0 zombie
Cpu0 : 85.0%us, 15.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu1 : 82.0%us, 18.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu2 : 80.0%us, 20.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu3 : 81.0%us, 19.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu4 : 84.8%us, 15.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu5 : 84.8%us, 15.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu6 : 81.2%us, 18.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu7 : 85.0%us, 15.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu8 : 80.0%us, 20.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu9 : 83.8%us, 16.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu10 : 81.2%us, 18.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu11 : 83.0%us, 17.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu12 : 85.0%us, 15.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu13 : 86.0%us, 14.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu14 : 85.0%us, 15.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu15 : 82.0%us, 18.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Mem: 65913992k total, 1698216k used, 64215776k free, 197928k buffers
Swap: 65537156k total, 0k used, 65537156k free, 417972k cached

2 nodes, 32 cores:
Info: Benchmark time: 32 CPUs 0.101423 s/step 0.586941 days/ns 230.512 MB
memory
Info: Benchmark time: 32 CPUs 0.109949 s/step 0.636282 days/ns 230.512 MB
memory
Info: Benchmark time: 32 CPUs 0.109061 s/step 0.631138 days/ns 230.512 MB
memory
CPU usage:
Tasks: 344 total, 9 running, 335 sleeping, 0 stopped, 0 zombie
Cpu0 : 56.3%us, 43.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu1 : 55.1%us, 44.9%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu2 : 55.3%us, 44.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu3 : 59.6%us, 40.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu4 : 55.8%us, 44.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu5 : 56.1%us, 43.9%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu6 : 56.3%us, 43.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu7 : 57.0%us, 43.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu8 : 57.7%us, 42.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu9 : 55.0%us, 44.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu10 : 55.1%us, 44.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu11 : 51.7%us, 46.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 1.7%si,
 0.0%st
Cpu12 : 54.0%us, 43.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.3%si,
 0.0%st
Cpu13 : 55.0%us, 43.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.0%si,
 0.0%st
Cpu14 : 56.2%us, 40.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.0%si,
 0.0%st
Cpu15 : 57.1%us, 41.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.7%si,
 0.0%st
Mem: 65913992k total, 1548572k used, 64365420k free, 199024k buffers
Swap: 65537156k total, 0k used, 65537156k free, 386112k cached

3nodes, 48cores:
Info: Benchmark time: 48 CPUs 0.125787 s/step 0.727932 days/ns 228.543 MB
memory
Info: Benchmark time: 48 CPUs 0.130151 s/step 0.753191 days/ns 228.543 MB
memory
Info: Benchmark time: 48 CPUs 0.123472 s/step 0.714536 days/ns 228.543 MB
memory
CPU usage:
Tasks: 344 total, 9 running, 335 sleeping, 0 stopped, 0 zombie
Cpu0 : 39.3%us, 60.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu1 : 37.0%us, 63.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu2 : 40.7%us, 59.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu3 : 40.7%us, 59.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu4 : 42.9%us, 57.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu5 : 37.0%us, 63.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu6 : 35.7%us, 60.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.6%si,
 0.0%st
Cpu7 : 42.3%us, 57.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu8 : 35.7%us, 64.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu9 : 40.7%us, 59.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu10 : 33.3%us, 66.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu11 : 38.5%us, 57.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.8%si,
 0.0%st
Cpu12 : 35.7%us, 60.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.6%si,
 0.0%st
Cpu13 : 38.5%us, 57.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.8%si,
 0.0%st
Cpu14 : 35.7%us, 60.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.6%si,
 0.0%st
Cpu15 : 39.3%us, 57.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.6%si,
 0.0%st
Mem: 65913992k total, 1515064k used, 64398928k free, 199628k buffers
Swap: 65537156k total, 0k used, 65537156k free, 385860k cached

The problem is obvious. When using 48 cores (on 3 nodes), the speed is
slower than 16 cores (on 1 node). Note that the number of process varies
when running NAMD; some processes are sleeping. :///

Other information (on 48 cores,3 nodes)
vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r b swpd free buff cache si so bi bo in cs us sy id
wa st
17 0 0 64395660 204864 389380 0 0 0 1 7 1 3 2
95 0 0
17 0 0 64399256 204864 389384 0 0 0 0 11367 2175 37 63
 0 0 0
17 0 0 64403612 204864 389384 0 0 0 0 11497 2213 38 62
 0 0 0
17 0 0 64397588 204864 389384 0 0 0 0 11424 2215 38 62
 0 0 0
17 0 0 64396108 204864 389384 0 0 0 0 11475 2262 37 63
 0 0 0
17 0 0 64400460 204868 389384 0 0 0 364 11432 2227 37 63
 0 0 0
17 0 0 64401452 204868 389384 0 0 0 0 11439 2204 38 62
 0 0 0
17 0 0 64405408 204868 389384 0 0 0 0 11400 2230 37 63
 0 0 0
17 0 0 64396108 204868 389384 0 0 0 0 11424 2245 39 61
 0 0 0
17 0 0 64395276 204868 389384 0 0 0 0 11396 2289 38 62
 0 0 0

Mpstat P ALL 1 10
Average: CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
Average: all 37.27 0.00 61.80 0.00 0.03 0.90 0.00
 0.00 11131.34
Average: 0 38.32 0.00 61.48 0.00 0.00 0.20 0.00
 0.00 999.00
Average: 1 36.60 0.00 63.20 0.00 0.00 0.20 0.00
 0.00 0.00
Average: 2 38.26 0.00 61.64 0.00 0.00 0.10 0.00
 0.00 0.00
Average: 3 36.03 0.00 63.77 0.00 0.00 0.20 0.00
 0.00 0.00
Average: 4 38.16 0.00 61.64 0.00 0.00 0.20 0.00
 0.00 0.00
Average: 5 38.00 0.00 61.90 0.00 0.00 0.10 0.00
 0.00 0.00
Average: 6 37.06 0.00 62.74 0.00 0.00 0.20 0.00
 0.00 0.00
Average: 7 38.26 0.00 61.54 0.00 0.00 0.20 0.00
 0.00 0.00
Average: 8 36.36 0.00 63.44 0.00 0.00 0.20 0.00
 0.00 8.08
Average: 9 36.26 0.00 63.54 0.00 0.00 0.20 0.00
 0.00 0.00
Average: 10 38.36 0.00 61.54 0.00 0.00 0.10 0.00
 0.00 0.00
Average: 11 35.56 0.00 61.84 0.00 0.10 2.50 0.00
 0.00 1678.64
Average: 12 35.66 0.00 61.34 0.00 0.10 2.90 0.00
 0.00 1823.35
Average: 13 37.34 0.00 60.36 0.00 0.00 2.30 0.00
 0.00 2115.77
Average: 14 36.90 0.00 60.40 0.00 0.10 2.60 0.00
 0.00 2790.02
Average: 15 38.96 0.00 58.44 0.00 0.10 2.50 0.00
 0.00 1716.67

Iostat 1
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 19.00 0.00 200.00 0 200
sda1 19.00 0.00 200.00 0 200
sda2 0.00 0.00 0.00 0 0
sda3 0.00 0.00 0.00 0 0
sda4 0.00 0.00 0.00 0 0
sda5 0.00 0.00 0.00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
          39.10 0.00 60.90 0.00 0.00 0.00

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sda1 0.00 0.00 0.00 0 0
sda2 0.00 0.00 0.00 0 0
sda3 0.00 0.00 0.00 0 0
sda4 0.00 0.00 0.00 0 0
sda5 0.00 0.00 0.00 0 0

The speed will be better if I use SGE (Sun Grid Engine) to submit NAMD job.
1 node, 16cores
Info: Benchmark time: 16 CPUs 0.125926 s/step 0.728737 days/ns 230.543 MB
memory
Info: Benchmark time: 16 CPUs 0.12478 s/step 0.722105 days/ns 230.812 MB
memory
Info: Benchmark time: 16 CPUs 0.12411 s/step 0.718229 days/ns 230.996 MB
memory
CPU usage:
Tasks: 346 total, 11 running, 335 sleeping, 0 stopped, 0 zombie
Cpu0 : 87.5%us, 12.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu1 : 83.3%us, 16.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu2 : 83.3%us, 16.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu3 : 87.5%us, 12.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu4 : 75.0%us, 25.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu5 : 80.0%us, 20.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu6 : 83.3%us, 16.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu7 : 76.0%us, 24.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu8 : 80.0%us, 20.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu9 : 83.3%us, 16.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu10 : 80.0%us, 20.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu11 : 76.0%us, 24.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu12 : 83.3%us, 16.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu13 : 80.0%us, 20.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu14 : 79.2%us, 20.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu15 : 83.3%us, 16.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Mem: 65913992k total, 1734512k used, 64179480k free, 206632k buffers
Swap: 65537156k total, 0k used, 65537156k free, 422752k cached

2node, 32cores:
Info: Benchmark time: 32 CPUs 0.0742307 s/step 0.429576 days/ns 228.188 MB
memory
Info: Benchmark time: 32 CPUs 0.0730147 s/step 0.422539 days/ns 228.188 MB
memory
Info: Benchmark time: 32 CPUs 0.0741893 s/step 0.429336 days/ns 228.188 MB
memory
CPU usage:
Tasks: 341 total, 8 running, 333 sleeping, 0 stopped, 0 zombie
Cpu0 : 72.0%us, 27.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu1 : 60.3%us, 39.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu2 : 60.0%us, 39.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu3 : 68.7%us, 31.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu4 : 65.3%us, 34.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu5 : 65.4%us, 34.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu6 : 67.7%us, 32.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu7 : 68.1%us, 31.9%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu8 : 82.3%us, 17.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu9 : 63.5%us, 36.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu10 : 59.8%us, 40.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu11 : 68.8%us, 28.9%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 2.0%si,
 0.0%st
Cpu12 : 68.9%us, 29.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.0%si,
 0.0%st
Cpu13 : 64.1%us, 33.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.3%si,
 0.0%st
Cpu14 : 71.3%us, 27.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si,
 0.0%st
Cpu15 : 62.5%us, 34.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.0%si,
 0.0%st
Mem: 65913992k total, 2727068k used, 63186924k free, 302436k buffers
Swap: 65537156k total, 0k used, 65537156k free, 1401708k cached

3node, 48cores:
Info: Benchmark time: 48 CPUs 0.0791372 s/step 0.45797 days/ns 174.879 MB
memory
Info: Benchmark time: 48 CPUs 0.0819765 s/step 0.474401 days/ns 174.879 MB
memory
Info: Benchmark time: 48 CPUs 0.0770723 s/step 0.44602 days/ns 174.879 MB
memory
CPU usage:
Tasks: 324 total, 12 running, 312 sleeping, 0 stopped, 0 zombie
Cpu0 : 45.8%us, 53.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu1 : 45.2%us, 54.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu2 : 47.3%us, 52.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu3 : 45.3%us, 54.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu4 : 42.7%us, 57.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu5 : 48.2%us, 51.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu6 : 47.3%us, 52.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu7 : 47.3%us, 52.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu8 : 46.2%us, 53.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si,
 0.0%st
Cpu9 : 49.0%us, 51.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu10 : 47.0%us, 53.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
 0.0%st
Cpu11 : 51.5%us, 45.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 3.0%si,
 0.0%st
Cpu12 : 44.2%us, 53.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.3%si,
 0.0%st
Cpu13 : 46.2%us, 51.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.0%si,
 0.0%st
Cpu14 : 47.3%us, 50.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.3%si,
 0.0%st
Cpu15 : 46.7%us, 50.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 3.0%si,
 0.0%st
Mem: 65913992k total, 1720400k used, 64193592k free, 275224k buffers
Swap: 65537156k total, 0k used, 65537156k free, 525152k cached

In general, the benchmark data is:
Mpirun:
1node,16cores 0.716176 days/ns 15% system cpu usage
2nodes,32cores 0.586941 days/ns 45% system cpu usage
3nodes,48cores 0.727932 days/ns 60% system cpu usage
SGE:
1node,16cores 0.728737 days/ns 15% system cpu usage
2nodes,32cores 0.429576 days/ns 35% system cpu usage
3nodes,48cores 0.45797 days/ns 50% system cpu usage
Number of running processes varies in both Mpirun and SGE. The maximum data
transfer rate is about 200MB/s in these benchmark.
As you can see, the scaling is bad; system cpu usage increases when more
cores are used. I don't know why. Maybe it has something to do with our
switch.
If you know anything about the problem, please tell me. I really appreciate
your help!

Neil Zhou
School of Life Science, Tsinghua University, Beijing
China

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:58 CST