Re: CUDA-NAMD hangs -- check the Northbridge temp!

From: Pu Tian (tianpu_at_mail.nih.gov)
Date: Wed Jan 06 2010 - 09:36:55 CST

Hi Biff,

Thanks for sharing. That's very helpful information for anyone
(including me) who is considering using NAMD/GPU.

Best,

Pu

On Jan 5, 2010, at 10:51 PM, Biff Forbush wrote:

> Hi Namd & VMD gpu users,
>
> In getting a Nehalem-gtx295 system up-and-running I have
> experienced
> frequent (you could say regular) freezes in NAMD when multiple CPUs
> and
> GPUs are in use. In reviewing recent discussions, I see I am not the
> first with apparent "GPU overheating problems". But in this case,
> both
> CPU and GPU core temps were generally in the upper 50's and low 60's
> (C)
> -- very warm, but shouldn't be too much for these chips. [NVIDIA has
> the gpu fans running at 40-50% -- turning them up to 100% with nvclock
> lowered the GPU temps 3-5 degrees but did not prevent the hangups].
> After swearing for a while at the usual (software) suspects, I stuck
> my
> hand in the case to check the two X58 (Tylersburg Northbridge)
> heatsinks...
>
> ... almost burned myself -- the heatsinks were 88oC under NAMD
> load
> and 78oC at "idle" (no X, no NAMD) as checked with an IR thermometer.
> Sure enough, directing a cool air gun at the heatsinks dropped the
> heatsink temps to under 50oC (without significantly affecting CPU or
> GPU
> temps) and COMPLETELY solved the NAMD freezeup problem.
>
> Moral of the story: Check the Northbridge temp, not just the CPU
> and GPU. Apparently this particular board is terribly underdesigned
> in
> this regard, but I suspect the problem is more general. [This board
> has
> low-profile X58 heatsinks (aka egg cookers), no fans, and no room for
> much more, since the X58s are underneath two of the double PCIEx16
> slots... it should be possible to mount a small fan to blow
> horizontally
> over these, else liquid is needed].
>
> Board Tyan S7025, dual Xeon Nehalem (3.33GHz, Scythe coolers), dual
> X58's, two Geforce gtx295 (BFG), one Master Heat Gun (heater off).
> Benchmarks to follow. soon.
>
> [It remains a mystery to me why the X58s are running so hot at
> "idle"].
>
> Regards,
> Biff
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:19 CST