Re: NAMD on KRAKEN

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Oct 28 2011 - 04:43:59 CDT

On Oct 28, 2011, at 3:05 AM, PAUL NEWMAN <paulclizana_at_gmail.com> wrote:

> Dear NAMD users,
>
> I am running a Free Energy calculation on KRAKEN and I got the following error. ( Sorry I don't know if it is appropriate to port here )
>
>
> #############################################################################################################################
> ENERGY: 400 5033.9992 13345.0199 7704.4458 0.0000 -1781977.4449 211831.2980 0.0000 0.0000 301202.1718 -1242860.5102 309.7946 -1544062.6820 -1242026.2916 310.0217 36.8877 67.6739 4928733.5188 28.9856 28.9725
>
> [0] MPICH has run out of unexpected buffer space.
> Try increasing the value of env var MPICH_UNEX_BUFFER_SIZE (cur value is 62914560),
> and/or reducing the size of MPICH_MAX_SHORT_MSG_SIZE (cur value is 128000).
> aborting job:
> out of unexpected buffer space
> FreeEnergy: 500 1.000 Stop 1.00000 0.00000 ( 40.701, 63.777, 113.334) ( 40.718, 63.932, 113.375) 0.161 |
> SMD 500 44.5049 73.3361 141.589 0 0 0.298436
> [NID 10569] 2011-10-27 20:24:16 Apid 7553654: initiated application termination
> Application 7553654 exit codes: 255
> Application 7553654 exit signals: Killed
> Application 7553654 resources: utime 33496, stime 298
> ############################################################################################################################
>
>
> I also add the following lines in the running script after the aprun but I still got the same error.
>
> aprun -n \$PBS_NNODES -cc cpu /lustre/scratch/jphillip/NAMD_2.7_CRAY-XT-Kraken/namd2 $CONFFILE >& $LOGFILE
>
> setenv MPICH_PTL_SEND_CREDITS -1
> setenv MPICH_MAX_SHORT_MSG_SIZE 8000
> setenv MPICH_PTL_UNEX_EVENTS 100M
> setenv MPICH_UNEX_BUFFER_SIZE 500M
>
> It seems that it is not changing the default values. Any help will be highly appreciate it.
>

Changing environment variables _after_ the aprun command is pretty
useless. They won't affect it. You have to move those commands up.

Axel

> Thanks
>
> --
> Cheers,
>
> Paul
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:52 CST