Errata to: Update to CUDA error in NAMD 2.7: Increase MAX_EXCLUSIONS: problem persists and CPU-only MD scales poorly

From: Pietro Amodeo (pamodeo_at_icmib.na.cnr.it)
Date: Mon Dec 27 2010 - 14:20:03 CST

Hi,

sorry but the table in my last post is wrong:
1) obviously, the reported ratio is Time(1)/Time(N) and NOT
Time(N)/Time(1)!!!!
2) the correct figures are:
N Time(1)/Time(N)
 1 1
 2 1.9733511924
 4 3.5960034869
 6 5.1641581203
 8 6.5367137981
10 8.0500773076
12 9.1171710303

16 8.8086727989

20 9.6037249284
22 10.103089676
24 10.6848376171

Pietro

On Lun, Dicembre 27, 2010, 3:01 pm, Pietro Amodeo disse:
> Hello,
>
> one month ago I posted a message (title: CUDA error in NAMD 2.7: Increase
> MAX_EXCLUSIONS) about a systematic error I obtain when trying to simulate
> a relatively large (120978 atoms) system on a 2 CPU (Xeon 5650 SixCore)
> workstation (seen as a 24-core machine) equipped with two CUDA boards.
> For detail about errors, including further HW info and a typical output
> from a failed run, please see my previous post.
>
> Meantime, I've performed further tests, by changing more radically the
> simulation conditions (e.g. by switching to NVT runs), but the number of
> exclusions and the error message didn't change. So, I'd like to know if
> this limitation can be circumvented or the system is just too large (or
> its composition generates this pathological behaviour).
>
> By switching to pure CPU simulations, jobs run flawlessly but, when
> submitting test simulations of 10.000 MD steps each on a variable number
> of cores, the results obtained using "new load balancers -- ASB" show a
> scaling that, considering that simulations were run on a single machine,
> are far from ideal and, in any case, worse than those observed with NAMD
> 2.6 on a 112-core cluster (with dual-opteron 8-core nodes and infiniband
> connection). Unfortunately, although the simulation setup was very
> similar, the systems tested on the cluster were different (smaller) and
> presently I can't align the two sets of benchmarks.
>
> Here is a table of the relative scalings vs. the number of employed cores,
> obtained using NAMD 2.7 on the 24-core workstation:
>
> N Time(N)/Time(1)
> 1 1
> 2 1.8950753798
> 4 3.4533628378
> 6 4.9593143628
> 8 6.277425646
> 10 7.7307594158
> 12 8.7555253315
>
> 16 8.4592641261
>
> 20 9.2227793696
> 22 9.7023360965
> 24 10.261008169
>
> Comments/suggestions about both the errors for the GPU, and the scaling
> for the CPU versions of NAMD 2.7 are welcome.
> Again, I can provide any other information or execute tests that may be
> useful for the resolution of the problem.
>
> Thanks in advance,
> Pietro
>
>

-- 
Dr. Pietro Amodeo
Istituto di Chimica Biomolecolare del CNR
Comprensorio "A. Olivetti", Edificio 70
Via Campi Flegrei 34
I-80078 Pozzuoli (Napoli) - Italy
Phone      +39-0818675072
Fax        +39-0818041770
Email    pamodeo_at_icmib.na.cnr.it

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:53 CST