Re: Running on GPU multiple nodes

From: Adupa Vasista (adupavasista_at_gmail.com)
Date: Sun Mar 29 2020 - 21:34:14 CDT

Ok, thanks for the clarification.

On Mon, Mar 30, 2020 at 4:17 AM Josh Vermaas <joshua.vermaas_at_gmail.com>
wrote:

> Hi,
>
> Looking at the log, you are using a multicore build, which does not have
> the facilities for communicating across nodes (it assumes all the memory is
> on one node). For using GPUs across nodes, you'll want an SMP-CUDA build,
> not a multicore one.
>
> -Josh
> On 3/29/20 1:22 PM, Adupa Vasista wrote:
>
> Thanks for the reply.
> I tried as you said but fell into a new error as follows:* FATAL ERROR:
> Unknown command-line option ++nodelist*
>
> Any idea on what's wrong with that.
>
> Thank you.
>
>
> On Sun, Mar 29, 2020 at 6:45 PM Renfro, Michael <Renfro_at_tntech.edu> wrote:
>
>> Your log also shows:
>>
>> Charm++> Running on 1 unique compute nodes (24-way SMP).
>>
>> and I suspect you’ll find one of your two nodes sits completely idle if
>> viewed through ‘top’ or a similar utility.
>>
>> At a minimum, I think you’ll need a ++nodelist parameter added to
>> charmrun. https://www.ks.uiuc.edu/Research/namd/2.11/ug/node83.html has
>> one example for this.
>>
>> --
>> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
>> Services
>> 931 372-3601 <931%20372-3601> / Tennessee Technological University
>>
>> On Mar 29, 2020, at 6:37 AM, Adupa Vasista <adupavasista_at_gmail.com>
>> wrote:
>>
>> Dear NAMD users
>>
>> I hope everyone is safe amid the pandemic.
>>
>> When I try to run the simulation of GPU multiple nodes, I am getting a
>> drop in the performance when compared to running on a single node. I am
>> using charmrun to run on multiple nodes. But on the first line on the log
>> file says *Charm++: standalone mode (not using charmrun).*
>>
>> Here are the Benchmark results
>> Info: Benchmark time: 24 CPUs 0.0420026 s/step 0.486141 days/ns 1518.28
>> MB memory
>> Info: Benchmark time: 48 CPUs 0.0556737 s/step 0.644372 days/ns 1799.73
>> MB memory
>>
>> I am using the following command
>> Running on 2 nodes 24 cores each and 2 GPU's
>> charmrun +p48 namd2 +idlepoll +devices 0,1 abeta_eq.conf > Solution_eq.log
>>
>> Here, I attached the log file.
>> PFA
>>
>> Please let me know if I need to change anything in the command.
>>
>> Thank you.
>> <Solution_eq.log>
>>
>>
>
> --
>
>

-- 
*A.VasistaM.Tech,Department Of Chemical Engineering,*
*IIT Guwahati.*

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:08 CST