Re: Oh dear, 'namd2 invoked oom-killer' ...

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Fri Apr 15 2011 - 11:34:14 CDT

You don't say how long this ran before running out of memory, but there is
a memory leak in trajector/restart output in NAMD 2.8b1. The bug has been
fixed so please try a nightly build.

-Jim

On Fri, 15 Apr 2011, Nicholas M Glykos wrote:

>
> Dear NAMD Developers,
>
> On a single i7 975 box with a GTX295 card and using the pre-build "NAMD
> 2.8b1 for Linux-x86_64-CUDA" executable (see below), CUDA-enabled runs
> awake oom-killer (non-cuda runs on the same system are OK). This is a
> small (45K atoms) system and we use the amber forcefield.
>
> Portions of the namd log and kernel ring follow:
>
>
>
> ________________________ HEAD OF NAMD LOG _____________________________
>
> Charm++> scheduler running in netpoll mode.
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> cpu affinity enabled.
> Charm++> Running on 1 unique compute nodes (8-way SMP).
> Charm++> cpu topology info is gathered in 0.006 seconds.
> Info: NAMD 2.8b1 for Linux-x86_64-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat Mar 26 11:10:31 CDT 2011 by jim on larissa.ks.uiuc.edu
> Info: 1 NAMD 2.8b1 Linux-x86_64-CUDA 4 n0009 tafarli
> Info: Running on 4 processors, 4 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.0277281 s
> Did not find +devices i,j,k,... argument, using all
> Pe 3 sharing CUDA device 1 first 1 next 1
> Pe 2 sharing CUDA device 0 first 0 next 0
> Pe 0 sharing CUDA device 0 first 0 next 2
> Pe 1 sharing CUDA device 1 first 1 next 3
> Pe 3 physical rank 3 binding to CUDA device 1 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
> Pe 2 physical rank 2 binding to CUDA device 0 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
> Pe 0 physical rank 0 binding to CUDA device 0 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
> Pe 1 physical rank 1 binding to CUDA device 1 on n0009: 'GeForce GTX 295' Mem: 895MB Rev: 1.3
> Info: 1.63521 MB of memory in use based on CmiMemoryUsage
> ......
>
>
>
> ____________________________ KERNEL RING _______________________________
>
> ......
>
> namd2 invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0
> Pid: 5504, comm: namd2 Tainted: P 2.6.26.5-2.nsa1 #1
>
> Call Trace:
> [<ffffffff80264ea0>] oom_kill_process+0x57/0x1f0
> [<ffffffff80265423>] badness+0x19a/0x1de
> [<ffffffff802655f5>] out_of_memory+0x18e/0x1cd
> [<ffffffff80267eb7>] __alloc_pages_internal+0x364/0x420
> [<ffffffff80270e07>] handle_mm_fault+0x21f/0x68c
> [<ffffffff802340b3>] dequeue_signal+0x8f/0x108
> [<ffffffff8023445c>] get_signal_to_deliver+0xe3/0x30f
> [<ffffffff802181c6>] do_page_fault+0x443/0x835
> [<ffffffff802747b9>] vma_adjust+0x207/0x45a
> [<ffffffff803b7858>] sys_recvfrom+0xc3/0x122
> [<ffffffff80274f67>] vma_merge+0x147/0x260
> [<ffffffff802754e9>] do_brk+0x2b3/0x36b
> [<ffffffff80433559>] error_exit+0x0/0x51
>
> Mem-info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> CPU 1: hi: 0, btch: 1 usd: 0
> CPU 2: hi: 0, btch: 1 usd: 0
> CPU 3: hi: 0, btch: 1 usd: 0
> CPU 4: hi: 0, btch: 1 usd: 0
> CPU 5: hi: 0, btch: 1 usd: 0
> CPU 6: hi: 0, btch: 1 usd: 0
> CPU 7: hi: 0, btch: 1 usd: 0
> DMA32 per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 162
> CPU 1: hi: 186, btch: 31 usd: 73
> CPU 2: hi: 186, btch: 31 usd: 124
> CPU 3: hi: 186, btch: 31 usd: 8
> CPU 4: hi: 186, btch: 31 usd: 0
> CPU 5: hi: 186, btch: 31 usd: 165
> CPU 6: hi: 186, btch: 31 usd: 0
> CPU 7: hi: 186, btch: 31 usd: 114
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 164
> CPU 1: hi: 186, btch: 31 usd: 77
> CPU 2: hi: 186, btch: 31 usd: 115
> CPU 3: hi: 186, btch: 31 usd: 72
> CPU 4: hi: 186, btch: 31 usd: 0
> CPU 5: hi: 186, btch: 31 usd: 157
> CPU 6: hi: 186, btch: 31 usd: 5
> CPU 7: hi: 186, btch: 31 usd: 46
> Active:1399010 inactive:62487 dirty:0 writeback:0 unstable:0
> free:6396 slab:9926 mapped:13876 pagetables:2982 bounce:0
> DMA free:3864kB min:4kB low:4kB high:4kB active:0kB inactive:0kB present:3040kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 2989 6019 6019
> DMA32 free:16952kB min:4928kB low:6160kB high:7392kB active:2898392kB inactive:36kB present:3060736kB pages_scanned:5228089 all_unreclaimable? yes
> lowmem_reserve[]: 0 0 3030 3030
> Normal free:4768kB min:4996kB low:6244kB high:7492kB active:2698032kB inactive:249656kB present:3102720kB pages_scanned:4067912 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> DMA: 2*4kB 4*8kB 5*16kB 5*32kB 2*64kB 3*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3864kB
> DMA32: 20*4kB 95*8kB 15*16kB 12*32kB 6*64kB 2*128kB 4*256kB 3*512kB 2*1024kB 3*2048kB 1*4096kB = 16952kB
> Normal: 60*4kB 4*8kB 5*16kB 4*32kB 3*64kB 0*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4768kB
> 97295 total pagecache pages
> Swap cache: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 1835007 pages of RAM
> 313075 reserved pages
> 14085 pages shared
> 0 pages swap cached
> Out of memory: kill process 5503 (charmrun) score 238393 or a child
> Killed process 5504 (namd2)
>
> ___________________________________________________________________
>
>
>
>
> --
>
>
> Dr Nicholas M. Glykos, Department of Molecular Biology
> and Genetics, Democritus University of Thrace, University Campus,
> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:09 CST