How to optimize parameters in command for NAMD CUDA GPU MD calculations?

From: Michael Shokhen (michael.shokhen_at_biu.ac.il)
Date: Thu Sep 12 2013 - 14:59:21 CDT

Dear Namd Users,

I wanted to estimate what is the computational advancement of GeForce GTX TITAN vs.

GeForce GTX 285 in molecular dynamics (MD) simulations by NAMD software with implemented

CUDA GPU.

I have installed on my computer the NAMD_CVS-2013-06-14_Linux-x86_64-multicore-CUDA and

vmd-1.9.1.bin.LINUXAMD64.opengl software packages.

I used as a benchmark the “Membrane Proteins Tutorial” by Alex Aksimentiev et all where

GeForce GTX 285 was used in MD calculations:

http://www.ks.uiuc.edu/Training/Tutorials/science/membrane/mem-tutorial.pdf

Following the instructions in Chapter 3 “Running a Simulation of KcsA ” page 34.

I have ran in a terminal window the NAMD software by the command modified for CUDA GPU:

namd2 +idlepoll kcsa_popcwimineq-01.conf > kcsa_popcwimineq-01.log

The acceleration coefficient over the tutorial variant is only 1.87.

See more details below in the LOG files for comparison.

I would appreciate if somebody could advise me what command (what additional parameters)

should I use for more optimal work of my computer hardware in order to get faster MD calculations on CUDA GPU. Unfortunately, I failed to find answer in internet.

Thank you,

Michael

My workstation hardware:

8 core Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz

cpu MHz : 1200.000

cache size : 10240 KB

1 GeForce GTX TITAN

System:
Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-45-generic x86_64)

NVIDIA-Linux-x86_64-319.32

Information from log files:

My variant:

Info: Running on 1 processors, 1 nodes, 1 physical nodes.

Info: CPU topology information available.

Info: Charm++/Converse parallel runtime startup completed at 0.00349903 s

Did not find +devices i,j,k,... argument, using all

Pe 0 physical rank 0 binding to CUDA device 0 on quant-lnx: 'GeForce GTX TITAN' Mem: 6143MB Rev: 3.5

Info: 6.47266 MB of memory in use based on /proc/self/stat

Info: Configuration file is kcsa_popcwimineq-01.conf

WallClock: 14751.504883 CPUTime: 14751.504883 Memory: 182.707031 MB

Program finished.

Tutorial variant:

Info: Running on 1 processors, 1 nodes, 1 physical nodes.

Info: CPU topology information available.

Info: Charm++/Converse parallel runtime startup completed at 0.0157971 s

Pe 0 physical rank 0 binding to CUDA device 0 on home-lnx: 'GeForce GTX 285' Mem: 1023MB Rev: 1.3

Info: 6.59766 MB of memory in use based on /proc/self/stat

Info: Configuration file is kcsa_popcwimineq-01.conf

WallClock: 27552.734375 CPUTime: 27552.734375 Memory: 129.031250 MB

Program finished.

*****************************
Michael Shokhen, PhD
Associate Professor
Department of Chemistry
Bar Ilan University,
Ramat Gan, 52900
Israel
email: shokhen_at_mail.biu.ac.il

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:43 CST