Re: Running NAMD on Linux Cluster

From: 김민재 (kjh950429_at_gmail.com)
Date: Wed Feb 27 2019 - 18:45:34 CST

Does that mean that I can simply replace ‘ibrun’ with ‘mpiexec’ and run a
CUDA-NVIDIA gpu accelerated MD simulation across multiple nodes without
changing anything else in my bashrc or my bash scripts? That is, are there
any setenv, or LD_LIBRARY_PATHs I need to write onto my bashrc?

Thanks

2019년 2월 28일 (목) 오전 4:57, Jim Phillips <jim_at_ks.uiuc.edu>님이 작성:

>
> As the error message says, there does not appear to be a command called
> "ibrun" on your cluster. This is not surprising, since the standard
> command is mpiexec. You would only need to write a "mympiexec" script if
> you needed to call a different program to launch parallel jobs. The
> "ibrun" command is the job launch command at TACC used as an example.
>
> Jim
>
>
> On Wed, 27 Feb 2019, �¹~@민ì~^¬ wrote:
>
> > Hi
> > I have been facing several problems while trying to run a MD simulation
> > through a supercomputer. I have read through the namd user guide and
> > followed their instructions. So, I have been writing bash scripts to run
> > namd. Following the userguide I wrote script called 'mympiexec':
> > #!/bin/csh
> > shift; shift; exec ibrun $*
> >
> > Then, I wrote a script called 'runme': (NAMD is the directory that holds
> > namd2 and mympiexec)
> >
> > cd NAMD
> > ./charmrun +p8 ++mpiexec ++remote-shell ./mympiexec ./namd2
> ./1ca2_eq.conf
> >
> > However, I got the following error message(testy):
> > Warning: Permanently added '[c12]:22554,[192.168.0.112]:22554' (ECDSA) to
> > the list of known hosts.
> > ibrun: Command not found.
> > ibrun: Command not found.
> > Charmrun> error attaching to node '127.0.0.1':
> > Timeout waiting for node-program to connect
> > (I attached the other message to this email)
> >
> > I also tried another method advised in a html guide to NAMD. I wrote
> > 'runscript':
> > #!/bin/csh
> > setenv LD_LIBRARY_PATH "${1:h}:$LD_LIBRARY_PATH"
> > $*
> >
> > And then, I wrote and ran the following script 'runCUDA':
> > #!/bin/csh
> > cd NAMD
> > ./charmrun ++runscript ./runscript +p8 ./namd2 +idlepoll ++ppn 1
> ./1ca2_eq
> >
> > In this case I got the following error message (testx):
> > Warning: Permanently added '[c38]:22554,[192.168.0.138]:22554' (ECDSA) to
> > the list of known hosts.
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Reconnection attempt 1 of 3
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Reconnection attempt 1 of 3
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Reconnection attempt 2 of 3
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Reconnection attempt 2 of 3
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Reconnection attempt 3 of 3
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Reconnection attempt 3 of 3
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Too many reconnection attempts; bailing out
> > ssh_exchange_identification: read: Connection reset by peer
> > Charmrun> Error 255 returned from remote shell (localhost:0)
> > Charmrun> Too many reconnection attempts; bailing out
> >
> > I am new to namd and I would really appreciate help. Thanks
> >

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:10 CST