From: Vlastimil Zíma (zima_at_karlov.mff.cuni.cz)
Date: Wed Apr 08 2015 - 02:41:23 CDT
The problem found:
Aparently the cron caused short overload, but long enough to make NAMD get
stuck. I'm used to run NAMD on all of the processors and cron runs among
others command "apt-get check -f" which requires so much of processor
capacity that NAMD doesn't get enough and can't even handle this state.
Interestingly it affects the CUDA multicore version, I was unable to
produce the error on no-CUDA version.
I managed the problem by leaving one of the processors unoccupied. It's not
very optimal since the processor is unused whole day only to provide space
for cron run. I suspect there is another solution that would make NAMD
resistant to the short processor overload.
2015-04-01 14:08 GMT+02:00 Vlastimil Zíma <zima_at_karlov.mff.cuni.cz>:
> In progress:
> I am very unsure about what causes the problem. Removal of the "apt" and
> "mlocate" jobs seemed to resolve the issue. On the other hand, I tested the
> cron itself by introduction of the "sleep 1" cron job which caused the
> issue if it was accompanied by the rest of the jobs, but the "sleep" job
> itself (removed all other corn jobs) didn't trigger the stalling. Neither
> multiple (8) "sleep 0.2" jobs haven't trigger the problem.
> I used following script to trigger the problem (for cycle with what I
> found in /etc/crontab)
> for I in $(seq 100); do echo -n "."; test -x /usr/sbin/anacron || ( cd /
> && run-parts --report /etc/cron.daily ); done; echo
> I should also mention I run my NAMD from screen using following command
> runnamd_cuda 20c_run.namd > 20c_run.output 2> 20c_run.error
> where runnamd_cuda is following script
> ulimit -c unlimited
> namd2-mc-cuda +setcpuaffinity +idlepoll +p6 +devices 0 "$@"
> The namd2-mc-cuda is compiled binary of namd with multicore and CUDA.
> 2015-04-01 10:04 GMT+02:00 Vlastimil Zíma <zima_at_karlov.mff.cuni.cz>:
>> Hi everyone,
>> I'm using NAMD2.9 on Debians equipped with GPU and I repetitively
>> encounter wierd behaviour, namd is still running, but no new output i
>> generated - restart files, dcd, not even the output redirected to a file. I
>> noticed it usually happens at 6.24 in the morning which led me to discover
>> that daily cron jobs are run at that time.
>> Here is the list of my system daily crons
>> I'm not yet sure which one of those causes the NAMD output to stall
>> neither what exactly causes the stalling. I was able to reproduce the
>> problem on two of my machines so far.
>> Does anybody had any similar issue?
This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:03 CST