Performance peoblem about NAMD-CUDA benchmarks

From: xiaoguang liu (liuxguang_at_gmail.com)
Date: Sun Apr 11 2010 - 06:53:16 CDT

Hi , Prof. Biff Forbush , (and anybody interested in it)

     After read your introduction about NAMD-CUDA benchmarks on NAMD mail
list(http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/11440.html), I
am also doing some benchmarks on my GPU server this week.

     My server's hadware is :
          Motherboard FT72 B7015, which has 8 16XPCI-E slots and onboard VGA
for console
          CPU: dual Intel Xeon 5520 , each has 4 cors ( 8 cores totally)
and support Hyper-Threading ( So there are 16 cores to operation system's
view)
          Memory: 8GB (DDR3 1333Mhz)
          Disk: 1.5TB
          GPU: eight Nvidia Tesla C1060 ( which has 240 streaming
processor cores, 4GB memory,using G200 core,which is same to your GTX295 )
http://www.nvidia.com/object/product_tesla_c1060_us.html
    software:
          NAMD:Version 2.7b2
Linux-x86_64-CUDA<http://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1009>
          CUDA : Version 2.3
          Nvidia driver: 195.36.15
          OS : Red Hat AS 5.3 kernel 2.6.18

    To compare with your results, I used the same benchmarks , DHFR (23,558
atoms), er-gre (36,753), ApoA1(92,224 atoms), F1ATPase (32,7506 atoms), and
STMV (106,6628 atoms) . They are all downliad from NAMD's website (
http://www.ks.uiuc.edu/Research/namd/utilities/ and
http://www.ks.uiuc.edu/Research/namd/tutorial/NCSA2001/performance.html)
    To all problems , I used 8 processes and all 8 GPUs, for example,
    ./charmrun ++local +p8 ./namd2 +idlepoll ../er-gre/er-gre.namd ++verbose

    When running Apoa1 and F1ATPase , it reported some error messgaes as
following:
    Fatal error on PE 2> FATAL ERROR: CUDA-accelerated NAMD does not support
NBFIerms in parameter file

     To the other three problems, the results are:

  Problem DHFR er-gre STMV
  Size 23,558 36,753
1,066,628
  s/step 0.0581146 s/step 0.105704 s/step 0.588051 s/step
  day/ns 0.336311 days/ns 1.22343 days/ns 6.80614 days/ns

    These results are much poorer than yours.
    As my hardwares are almost same to yours , why the performance is poorer
than yours?
    Do I lose some important things?

Many thanks!

   Liuxg

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:40 CST