Compiling NAMD with CUDA on MacOSX Mountain Lion

From: Maximilian Ebert (max.ebert_at_me.com)
Date: Sat Nov 17 2012 - 18:38:51 CST

Dear list,

I have a 2008 Macbook Pro with a GeForce 8600M GT which is according to NVIDIA capable of working with CUDA. I want to test a simple NAMD calculation on my computer first before moving to our university graphic cards cluster. I install cuda_5.0.36_macos and downloaded NAMD_2.9_Source. In addition I downloaded tcl8.5.9-macosx-x86_64-threaded and fftw-macosx-x86_64. I untarred both and renamed the folders in tcl-threaded and fftw. First I built charm-6.4.0, a simple net-darwin-x86_64. My megatest output is the following:

harm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (2-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Megatest is running on 1 nodes 1 processors.
test 0: initiated [groupring (milind)]
test 0: completed (0.00 sec)
test 1: initiated [nodering (milind)]
test 1: completed (0.00 sec)
test 2: initiated [varsizetest (mjlang)]
varsize: requires at least 2 processors
test 2: completed (0.00 sec)
test 3: initiated [varsizetest2 (phil)]
test 3: completed (0.00 sec)
test 4: initiated [varraystest (milind)]
varraystest: requires at least 2 processors
test 4: completed (0.00 sec)
test 5: initiated [groupcast (mjlang)]
test 5: completed (0.00 sec)
test 6: initiated [groupmulti (gengbin)]
test 6: completed (0.00 sec)
test 7: initiated [groupsectiontest (ebohm)]
groupsectiontest: requires at least 2 processors
test 7: completed (0.00 sec)
test 8: initiated [multisectiontest (ebohm)]
multisectiontest: requires at least 2 processors
test 8: completed (0.00 sec)
test 9: initiated [nodecast (milind)]
test 9: completed (0.00 sec)
test 10: initiated [synctest (mjlang)]
test 10: completed (0.00 sec)
test 11: initiated [fib (jackie)]
test 11: completed (0.00 sec)
test 12: initiated [arrayring (fang)]
test 12: completed (0.00 sec)
test 13: initiated [tempotest (fang)]
test 13: completed (0.00 sec)
test 14: initiated [packtest (fang)]
test 14: completed (0.00 sec)
test 15: initiated [queens (jackie)]
test 15: completed (0.00 sec)
test 16: initiated [migration (jackie)]
migration: requires at least 2 processors.
test 16: completed (0.00 sec)
test 17: initiated [marshall (olawlor)]
test 17: completed (0.02 sec)
test 18: initiated [priomsg (fang)]
test 18: completed (0.00 sec)
test 19: initiated [priotest (mlind)]
test 19: completed (0.00 sec)
test 20: initiated [rotest (milind)]
test 20: completed (0.00 sec)
test 21: initiated [statistics (olawlor)]
test 21: completed (0.00 sec)
test 22: initiated [templates (milind)]
test 22: completed (0.00 sec)
test 23: initiated [inherit (olawlor)]
test 23: completed (0.00 sec)
test 24: initiated [reduction (olawlor)]
test 24: completed (0.00 sec)
test 25: initiated [bitvector (jbooth)]
test 25: completed (0.00 sec)
test 26: initiated [immediatering (gengbin)]
test 26: completed (0.00 sec)
test 27: initiated [callback (olawlor)]
test 27: completed (0.00 sec)
test 28: initiated [inlineem (phil)]
test 28: completed (0.00 sec)
test 29: initiated [completion_test (phil)]
Starting test
Created detector, starting first detection
Started first test
Finished second test
Started third test
test 29: completed (0.00 sec)
test 30: initiated [multi groupring (milind)]
test 30: completed (0.00 sec)
test 31: initiated [multi nodering (milind)]
test 31: completed (0.00 sec)
test 32: initiated [multi varsizetest (mjlang)]
varsize: requires at least 2 processors
varsize: requires at least 2 processors
varsize: requires at least 2 processors
varsize: requires at least 2 processors
varsize: requires at least 2 processors
test 32: completed (0.00 sec)
test 33: initiated [multi varsizetest2 (phil)]
test 33: completed (0.00 sec)
test 34: initiated [multi varraystest (milind)]
varraystest: requires at least 2 processors
varraystest: requires at least 2 processors
varraystest: requires at least 2 processors
varraystest: requires at least 2 processors
varraystest: requires at least 2 processors
test 34: completed (0.00 sec)
test 35: initiated [multi groupcast (mjlang)]
test 35: completed (0.00 sec)
test 36: initiated [multi groupmulti (gengbin)]
test 36: completed (0.00 sec)
test 37: initiated [multi groupsectiontest (ebohm)]
groupsectiontest: requires at least 2 processors
groupsectiontest: requires at least 2 processors
groupsectiontest: requires at least 2 processors
groupsectiontest: requires at least 2 processors
groupsectiontest: requires at least 2 processors
test 37: completed (0.00 sec)
test 38: initiated [multi multisectiontest (ebohm)]
multisectiontest: requires at least 2 processors
multisectiontest: requires at least 2 processors
multisectiontest: requires at least 2 processors
multisectiontest: requires at least 2 processors
multisectiontest: requires at least 2 processors
test 38: completed (0.00 sec)
test 39: initiated [multi nodecast (milind)]
test 39: completed (0.00 sec)
test 40: initiated [multi synctest (mjlang)]
test 40: completed (0.00 sec)
test 41: initiated [multi fib (jackie)]
test 41: completed (0.00 sec)
test 42: initiated [multi arrayring (fang)]
test 42: completed (0.00 sec)
test 43: initiated [multi tempotest (fang)]
test 43: completed (0.00 sec)
test 44: initiated [multi packtest (fang)]
test 44: completed (0.00 sec)
test 45: initiated [multi migration (jackie)]
migration: requires at least 2 processors.
migration: requires at least 2 processors.
migration: requires at least 2 processors.
migration: requires at least 2 processors.
migration: requires at least 2 processors.
test 45: completed (0.00 sec)
test 46: initiated [multi marshall (olawlor)]
test 46: completed (0.10 sec)
test 47: initiated [multi priomsg (fang)]
test 47: completed (0.00 sec)
test 48: initiated [multi priotest (mlind)]
test 48: completed (0.00 sec)
test 49: initiated [multi statistics (olawlor)]
test 49: completed (0.00 sec)
test 50: initiated [multi reduction (olawlor)]
test 50: completed (0.00 sec)
test 51: initiated [multi immediatering (gengbin)]
test 51: completed (0.00 sec)
test 52: initiated [multi callback (olawlor)]
test 52: completed (0.00 sec)
test 53: initiated [all-at-once]
varsize: requires at least 2 processors
varraystest: requires at least 2 processors
groupsectiontest: requires at least 2 processors
multisectiontest: requires at least 2 processors
migration: requires at least 2 processors.
Starting test
Created detector, starting first detection
Started first test
Finished second test
Started third test
test 53: completed (0.02 sec)
All tests completed, exiting
Program finished.

Now I configured namd like this: ./config MacOSX-x86_64-g++ --with-cuda --cuda-prefix /usr/local/cuda/ --charm-arch net-darwin-x86_64. After doing make the compiling stops here:

g++ -arch x86_64 -fPIC -dynamic -fno-common -D_NO_MALLOC_H -I.rootdir/charm-6.4.0/net-darwin-x86_64/include -DCMK_OPTIMIZE=1 -Isrc -Iinc -Iplugins/include -DSTATIC_PLUGIN -I/Projects/namd2/tcl/tcl8.5.9-macosx-x86_64-threaded/include -DNAMD_TCL -I.rootdir/fftw/include -DNAMD_FFTW -DNAMD_CUDA -I. -I/usr/local/cuda//include -DNAMD_VERSION=\"2.9\" -DNAMD_PLATFORM=\"MacOSX-x86_64-CUDA\" -DREMOVE_PROXYRESULTMSG_EXTRACOPY -DNODEAWARE_PROXY_SPANNINGTREE -DUSE_NODEPATCHMGR -O3 -o obj/ComputeNonbondedCUDA.o -c src/ComputeNonbondedCUDA.C
src/ComputeNonbondedCUDA.C:31: error: thread-local storage not supported for this target

with many of the same errors in different lines. Any idea why?

Thank you very much and have a nice weekend.

Max

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:16 CST