AW: CUDA on GPU Cluster

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Aug 22 2011 - 00:47:16 CDT

Hi,

first I want to tell you, that you want see the change of the
LD_LIBRARY_PATH when looking from another ssh/rsh session, because its only
valid for the ssh/rsh session, in which it was set. It's not global for the
whole machine. You can check this by setting a variable, then login to the
machine from another console and look for the variable...it's not there.
When you start a charmrun, it will start n new sessions (+p n). Changes made
in those sessions, maybe with the runscript, will not be showed by env. So
Maybe theres just something wrong with your runscript.

Secondly, for me it works very fine with the ++runscript option. I attach my
runscript so you can try it, just change the path to yours.

And call:

Charmrun ++runscript /path/to/runscript/provide

Good luck

Norman Geist.

-----Ursprüngliche Nachricht-----
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Arvind Kannan
Gesendet: Samstag, 20. August 2011 00:31
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: CUDA on GPU Cluster

Hello all,

I'm having trouble getting the CUDA version of NAMD to run on my lab's
cluster. In particular, despite adding the path to the libcudart.so.2 file
in the environment variable LD_LIBRARY_PATH, namd is still unable to locate
the file. Calling env tells me that

LD_LIBRARY_PATH=/home/arvind/NAMD-CUDA:/opt/intel/Compiler/11.1/069/lib/inte
l64:/opt/cuda/lib64

The user's guide suggests that the problem might be that charmrun does not
preserve the value of LD_LIBRARY_PATH when run without the ++local option.
However, the variable remains unchanged after trying to run charmrun, so
that doesn't seem to be the case. Additionally, the other shared libraries
that namd finds successfully don't seem to come from the directories in the
library pathfile:

        libdl.so.2 => /lib64/libdl.so.2 (0x000000312b400000)
        libcudart.so.2 => not found
        libm.so.6 => /lib64/libm.so.6 (0x000000312bc00000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000312c400000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000312d000000)
        libc.so.6 => /lib64/libc.so.6 (0x000000312b000000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000000312b800000)
        /lib64/ld-linux-x86-64.so.2 (0x000000312ac00000)

I tried the user's guide's suggestion of putting

export LD_LIBRARY_PATH=/home/arvind/NAMD-CUDA:$LD_LIBRARY_PATH

in a runscript, and running charmun with the ++runscript option, but it
didn't help. LD_LIBRARY_PATH is unchanged by the runscript when run by
charmun, even though the script works fine when sourced from the command
line.

Any suggestions would be greatly appreciated.

Thanks,
Arvind


This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:24:20 CST