Oh dear, 'namd2 invoked oom-killer' ...

From: Nicholas M Glykos (glykos_at_mbg.duth.gr)
Date: Fri Apr 15 2011 - 09:55:35 CDT

Dear NAMD Developers,

On a single i7 975 box with a GTX295 card and using the pre-build "NAMD
2.8b1 for Linux-x86_64-CUDA" executable (see below), CUDA-enabled runs
awake oom-killer (non-cuda runs on the same system are OK). This is a
small (45K atoms) system and we use the amber forcefield.

Portions of the namd log and kernel ring follow:

________________________ HEAD OF NAMD LOG _____________________________

Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.006 seconds.
Info: NAMD 2.8b1 for Linux-x86_64-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Mar 26 11:10:31 CDT 2011 by jim on larissa.ks.uiuc.edu
Info: 1 NAMD 2.8b1 Linux-x86_64-CUDA 4 n0009 tafarli
Info: Running on 4 processors, 4 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.0277281 s
Did not find +devices i,j,k,... argument, using all
Pe 3 sharing CUDA device 1 first 1 next 1
Pe 2 sharing CUDA device 0 first 0 next 0
Pe 0 sharing CUDA device 0 first 0 next 2
Pe 1 sharing CUDA device 1 first 1 next 3
Pe 3 physical rank 3 binding to CUDA device 1 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
Pe 2 physical rank 2 binding to CUDA device 0 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
Pe 0 physical rank 0 binding to CUDA device 0 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
Pe 1 physical rank 1 binding to CUDA device 1 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
Info: 1.63521 MB of memory in use based on CmiMemoryUsage
......

____________________________ KERNEL RING _______________________________

......

namd2 invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0
Pid: 5504, comm: namd2 Tainted: P 2.6.26.5-2.nsa1 #1

Call Trace:
 [<ffffffff80264ea0>] oom_kill_process+0x57/0x1f0
 [<ffffffff80265423>] badness+0x19a/0x1de
 [<ffffffff802655f5>] out_of_memory+0x18e/0x1cd
 [<ffffffff80267eb7>] __alloc_pages_internal+0x364/0x420
 [<ffffffff80270e07>] handle_mm_fault+0x21f/0x68c
 [<ffffffff802340b3>] dequeue_signal+0x8f/0x108
 [<ffffffff8023445c>] get_signal_to_deliver+0xe3/0x30f
 [<ffffffff802181c6>] do_page_fault+0x443/0x835
 [<ffffffff802747b9>] vma_adjust+0x207/0x45a
 [<ffffffff803b7858>] sys_recvfrom+0xc3/0x122
 [<ffffffff80274f67>] vma_merge+0x147/0x260
 [<ffffffff802754e9>] do_brk+0x2b3/0x36b
 [<ffffffff80433559>] error_exit+0x0/0x51

Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
CPU 2: hi: 0, btch: 1 usd: 0
CPU 3: hi: 0, btch: 1 usd: 0
CPU 4: hi: 0, btch: 1 usd: 0
CPU 5: hi: 0, btch: 1 usd: 0
CPU 6: hi: 0, btch: 1 usd: 0
CPU 7: hi: 0, btch: 1 usd: 0
DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 162
CPU 1: hi: 186, btch: 31 usd: 73
CPU 2: hi: 186, btch: 31 usd: 124
CPU 3: hi: 186, btch: 31 usd: 8
CPU 4: hi: 186, btch: 31 usd: 0
CPU 5: hi: 186, btch: 31 usd: 165
CPU 6: hi: 186, btch: 31 usd: 0
CPU 7: hi: 186, btch: 31 usd: 114
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 164
CPU 1: hi: 186, btch: 31 usd: 77
CPU 2: hi: 186, btch: 31 usd: 115
CPU 3: hi: 186, btch: 31 usd: 72
CPU 4: hi: 186, btch: 31 usd: 0
CPU 5: hi: 186, btch: 31 usd: 157
CPU 6: hi: 186, btch: 31 usd: 5
CPU 7: hi: 186, btch: 31 usd: 46
Active:1399010 inactive:62487 dirty:0 writeback:0 unstable:0
 free:6396 slab:9926 mapped:13876 pagetables:2982 bounce:0
DMA free:3864kB min:4kB low:4kB high:4kB active:0kB inactive:0kB present:3040kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2989 6019 6019
DMA32 free:16952kB min:4928kB low:6160kB high:7392kB active:2898392kB inactive:36kB present:3060736kB pages_scanned:5228089 all_unreclaimable? yes
lowmem_reserve[]: 0 0 3030 3030
Normal free:4768kB min:4996kB low:6244kB high:7492kB active:2698032kB inactive:249656kB present:3102720kB pages_scanned:4067912 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 2*4kB 4*8kB 5*16kB 5*32kB 2*64kB 3*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3864kB
DMA32: 20*4kB 95*8kB 15*16kB 12*32kB 6*64kB 2*128kB 4*256kB 3*512kB 2*1024kB 3*2048kB 1*4096kB = 16952kB
Normal: 60*4kB 4*8kB 5*16kB 4*32kB 3*64kB 0*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4768kB
97295 total pagecache pages
Swap cache: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
1835007 pages of RAM
313075 reserved pages
14085 pages shared
0 pages swap cached
Out of memory: kill process 5503 (charmrun) score 238393 or a child
Killed process 5504 (namd2)

___________________________________________________________________

-- 
          Dr Nicholas M. Glykos, Department of Molecular Biology
     and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:08 CST