Re: Running NAMD on Linux Cluster

From: 김민재 (kjh950429_at_gmail.com)
Date: Wed Feb 27 2019 - 12:51:05 CST

Thanks for the advice and sorry for the confusion! I actually am trying to
run namd on multiple nodes. I thought the +p option should specify the
number of processors on each node. I am planning on using four nodes, each
has 2 cpus—Intel Xeon E5-2650 (Sandy Bridge-EP, 8-core 2.00GHz)—and 4
gpus—NVIDIA GeForce GTX 1080. The cluster system runs on CentOS Linux
Release 7.2. Should I then set +p to be +p32? Also, is there any
modifications I should make to my bashscript or bashrc environmental
variables to stop getting the errors I mentioned?

In addition, would intel and cuda libraries be loaded properly if I used
the distributed NAMD packages from the NAMD website? If not, from where and
how
should I load these libraries?

Sorry for asking another question that might seem trivial, but I am fairly
new to this...when running namd should I set the ‘+cpuaffinity’ option if I
were to run the simulation in the environment I described above?

Thanks
2019년 2월 28일 (목) 오전 3:26, Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>님이 작성:

> Ok, if you are only going to be running on 1 node at a time, I'd recommend
> starting with the multicore-CUDA version. Its the simplest GPU-supporting
> version to setup and run:
>
> /path/to/namd/namd2 +p8 configfile.namd > logfile.log
>
> The ibverbs smp builds are essential for running across multiple nodes,
> but from the low processor counts in your email, I don't think that is the
> situration you are setting up.
>
> -Josh
>
>
> On 2019-02-27 06:15:39-07:00 owner-namd-l_at_ks.uiuc.edu wrote:
>
>
> ---------- 전달된 메일 ----------
> 보낸사람: 김민재 <kjh950429_at_gmail.com>
> 날짜: 2019년 2월 27일 (수) 오전 10:18
> 제목: Re: namd-l: Running NAMD on Linux Cluster
> 받는사람: Aravinda Munasinghe <aravinda1879_at_gmail.com>
>
> Hi
> I’m new to namd, so I am using a precompiled namd binary that I downloaded
> from the namd website. To be more specific, I used the
> “Linux-x86_64-ibverbs-smp-CUDA” package (I’m trying to make use NVIDIA GPU
> acceleration.)
> Thanks
>
> 2019년 2월 27일 (수) 오전 5:06, Aravinda Munasinghe <aravinda1879_at_gmail.com>님이
> 작성:
>
>> Hi,
>> If you compiled charm from scratch as well, I recommend you to see if it
>> was compiled properly. (Try charm hello world). If it didn't work, most
>> probably from the error I see, your charm did not execute properly. What
>> flags did you use when compiling charm?
>> Best,
>> Aravinda Munasinghe
>> On Tue, Feb 26, 2019 at 10:24 AM 김민재 <kjh950429_at_gmail.com> wrote:
>>
>>> Hi
>>> I have been facing several problems while trying to run a MD simulation
>>> through a supercomputer. I have read through the namd user guide and
>>> followed their instructions. So, I have been writing bash scripts to run
>>> namd. Following the userguide I wrote script called 'mympiexec':
>>> #!/bin/csh
>>> shift; shift; exec ibrun $*
>>> Then, I wrote a script called 'runme': (NAMD is the directory that holds
>>> namd2 and mympiexec)
>>> cd NAMD
>>> ./charmrun +p8 ++mpiexec ++remote-shell ./mympiexec ./namd2
>>> ./1ca2_eq.conf
>>> However, I got the following error message(testy):
>>> Warning: Permanently added '[c12]:22554,[192.168.0.112]:22554' (ECDSA)
>>> to the list of known hosts.
>>> ibrun: Command not found.
>>> ibrun: Command not found.
>>> Charmrun> error attaching to node '127.0.0.1':
>>> Timeout waiting for node-program to connect
>>> (I attached the other message to this email)
>>> I also tried another method advised in a html guide to NAMD. I wrote
>>> 'runscript':
>>> #!/bin/csh
>>> setenv LD_LIBRARY_PATH "${1:h}:$LD_LIBRARY_PATH"
>>> $*
>>> And then, I wrote and ran the following script 'runCUDA':
>>> #!/bin/csh
>>> cd NAMD
>>> ./charmrun ++runscript ./runscript +p8 ./namd2 +idlepoll ++ppn 1
>>> ./1ca2_eq
>>> In this case I got the following error message (testx):
>>> Warning: Permanently added '[c38]:22554,[192.168.0.138]:22554' (ECDSA)
>>> to the list of known hosts.
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Reconnection attempt 1 of 3
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Reconnection attempt 1 of 3
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Reconnection attempt 2 of 3
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Reconnection attempt 2 of 3
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Reconnection attempt 3 of 3
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Reconnection attempt 3 of 3
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Too many reconnection attempts; bailing out
>>> ssh_exchange_identification: read: Connection reset by peer
>>> Charmrun> Error 255 returned from remote shell (localhost:0)
>>> Charmrun> Too many reconnection attempts; bailing out
>>> I am new to namd and I would really appreciate help. Thanks
>>>
>>
>> --
>> Aravinda Munasinghe,
>>
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:10 CST