From: Josh Vermaas (joshua.vermaas_at_gmail.com)
Date: Sun Mar 29 2020 - 17:47:49 CDT
Hi,
Looking at the log, you are using a multicore build, which does not have
the facilities for communicating across nodes (it assumes all the memory
is on one node). For using GPUs across nodes, you'll want an SMP-CUDA
build, not a multicore one.
-Josh
On 3/29/20 1:22 PM, Adupa Vasista wrote:
> Thanks for the reply.
> I tried as you said but fell into a new error as follows:* FATAL
> ERROR: Unknown command-line option ++nodelist*
>
> Any idea on what's wrong with that.
>
> Thank you.
>
>
> On Sun, Mar 29, 2020 at 6:45 PM Renfro, Michael <Renfro_at_tntech.edu
> <mailto:Renfro_at_tntech.edu>> wrote:
>
>     Your log also shows:
>
>       Charm++> Running on 1 unique compute nodes (24-way SMP).
>
>     and I suspect you’ll find one of your two nodes sits completely
>     idle if viewed through ‘top’ or a similar utility.
>
>     At a minimum, I think you’ll need a ++nodelist parameter added to
>     charmrun. https://www.ks.uiuc.edu/Research/namd/2.11/ug/node83.html has
>     one example for this.
>
>     --
>     Mike Renfro, PhD  / HPC Systems Administrator, Information
>     Technology Services
>     931 372-3601 <tel:931%20372-3601>      / Tennessee Technological
>     University
>
>>     On Mar 29, 2020, at 6:37 AM, Adupa Vasista
>>     <adupavasista_at_gmail.com <mailto:adupavasista_at_gmail.com>> wrote:
>>
>>     Dear NAMD users
>>
>>     I hope everyone is safe amid the pandemic.
>>
>>     When I try to run the simulation of GPU multiple nodes, I am
>>     getting a drop in the performance when compared to running on a
>>     single node.  I am using charmrun to run on multiple nodes. But
>>     on the first line on the log file says *Charm++: standalone mode
>>     (not using charmrun).*
>>     *
>>     *
>>     Here are the Benchmark results
>>     Info: Benchmark time: 24 CPUs 0.0420026 s/step 0.486141 days/ns
>>     1518.28 MB memory
>>     Info: Benchmark time: 48 CPUs 0.0556737 s/step 0.644372 days/ns
>>     1799.73 MB memory
>>
>>     I am using the following command
>>     Running on 2 nodes 24 cores each and 2 GPU's
>>     charmrun +p48 namd2 +idlepoll +devices 0,1 abeta_eq.conf >
>>     Solution_eq.log
>>
>>     Here, I attached the log file.
>>     PFA
>>
>>     Please let me know if I need to change anything in the command.  
>>
>>     Thank you.
>>     <Solution_eq.log>
>
>
>
> -- 
>
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST