From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue Jan 10 2012 - 14:21:26 CST
On Tue, Jan 10, 2012 at 3:03 PM, Gurunath Katagi
> Dear all,
> i am trying to run a simulation of solvated protein using NAMD 2.8 version
> on IMB cluster ..
> The job just starts and terminates immediately. i have pasted the last part
> of .log file up to which it has run
> Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 3.10193e-25 AT 9.94673
> Info: RELATIVE IMPRECISION IN VDWB TABLE FORCE: 1.07087e-15 AT 9.94673
> Info: Startup phase 8 took 0.610009 s, 183.422 MB of memory in use
> Info: Startup phase 9 took 0.000552893 s, 187.547 MB of memory in use
> Info: Finished startup at 9.32463 s, 187.547 MB of memory in use
> and in .error file , i am getting this error:
> ATTENTION: 0031-408 4 tasks allocated by LoadLeveler, continuing...
> ------------- Processor 2 Exiting: Caught Signal ------------
> ------------- Processor 3 Exiting: Caught Signal ------------
> Signal: 4
> Signal: 4
> ERROR: 0031-250 task 0: Terminated
> ERROR: 0031-250 task 2: Terminated
> ERROR: 0031-250 task 3: Terminated
> ERROR: 0031-250 task 1: Terminated
> The machine configuration goes like this :
> $uname -a
> Linux cnode39 2.6.5-7.244-pseries64 #1 SMP Mon Dec 12 18:32:25 UTC 2005
> ppc64 ppc64 ppc64 GNU/Linux
> and the submission file is as follows:
> # @ error = job1.$(Host).$(Cluster).$(
> # @ output = job1.$(Host).$(Cluster).$(Process).out
> # @ class = ptask64
> # @ job_type = parallel
> # @ total_tasks = 4
> # @ blocking = unlimited
> # @ wall_clock_limit=01:00:00
> # @ queue
> 'md1.conf' -nodes 16 -tasks_per_node 8
> I am not getting why this error is coming ( due to numerical error or
> installation or something else) and how to go about
> Can anybody please look into this and let me know...
on a linux machine, signal 4 is SIGILL, i.e. the executable
was trying to execute an illegal instruction, which happens
when it was compiled for a different variant of CPU, e.g. with
using SSE4 instructions on a CPU that only supports SSE3.
however, it is not entirely clear, if the program was terminated
by a signal handler or whether signal 4 is a load leveler signal.
in that case, you should did through the loadleveler docs.
> Thank you
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0 College of Science and Technology Temple University, Philadelphia PA, USA.
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:07 CST