Re: Exit code 127 with QMMM

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Tue Nov 17 2020 - 14:13:29 CST

Hi Marcelo
I'll pass your and orca's notes to the people at our computing center.

In the recent past, I circumvented the problem by asking a longer walltime.
Alternatively, when restarting a qmmm, how to pass to orca the two standard
flags in order that MO are read and the renamed .gbw be used?

thanks
francesco

On Tue, Nov 17, 2020, 19:19 Marcelo C. R. Melo <melomcr_at_gmail.com> wrote:

> Hi Francesco,
> Glad to be of help :)
>
> Using multiple nodes for ORCA is challenging. I have had issues in
> supercomputers when trying to do that.
> Theoretically, when you get the node-files from your scheduler (with a
> list of nodes and cores-per-node that the supercomputer scheduler allocated
> for your job), you should be able to pass that nodes-file to ORCA and
> direct it to use cores in other nodes. ORCA has examples for that in their
> documentation. In my experience, this has been complicated, and I only got
> it to work on a few supercomputers, a while ago when testing the interface.
> This is really dependent on the scheduler and environment infrastructure of
> the computer you are using.
>
> Best,
> Marcelo
>
> On Tue, 17 Nov 2020 at 04:25, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
>> Hi Marcelo
>> You are quite right. On changing from orca 4.2.1 to 4.2.2 I forgot to
>> change to the openmpi on which the latter was compiled. Now it runs
>> correctly, although a single node of 36 cores will probably be insufficient
>> for the nearly 300 QM atoms. I have to learn how to run namd-qmmm on
>> multinodes
>>
>> thanks
>> francesco
>>
>> On Mon, Nov 16, 2020 at 6:27 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
>> wrote:
>>
>>> Hi Francesco,
>>> I have not had this problem before. but "error termination in GTOInt"
>>> makes it sound like an issue between ORCA and the MPI environment. Is it
>>> possible that the MPI module in the cluster you are using changed? Or the
>>> version of ORCA?
>>>
>>> Common issues I have had before (and have received reports of) were
>>> related to bad starting geometries, which I solved by minimizing the system
>>> with semi-empirical methods which tend to be much more "forgiving".
>>> However, this kind of error tends to present itself with an SCF convergence
>>> error, or similar.
>>>
>>> Best,
>>> Marcelo
>>>
>>> On Mon, 16 Nov 2020 at 12:00, Francesco Pietra <chiendarret_at_gmail.com>
>>> wrote:
>>>
>>>> Hello
>>>> I am working on a new protein using namd.conf and namd.job that proved
>>>> valid before with another protein on one node of the same cluster using
>>>> ORCA 4.2.1 and NAMD_Git-2020-06-21_Linux-x86_64-multicore
>>>>
>>>> The command to namd:
>>>>
>>>> /galileo/....../NAMD_Git-2020-06-21_Linux-x86_64-multicore/namd2
>>>> namd-01.conf +p1 +CmiSleepOnIdle > namd-01.log
>>>>
>>>>
>>>>
>>>> File /0/..TmpOut tells
>>>> INPUT
>>>> NAME = /gpfs/work/IscrC_agelA/ASIC/QMMM/whole_npt15/0/qmmm_0.input
>>>> | 1> ! PBE0 RIJCOSX D3BJ def2-SVP enGrad
>>>> | 2> %pal nproc 34 end
>>>> | 3> %output Printlevel Mini Print[ P_Mulliken ] 1
>>>> Print[P_AtCharges_M] 1 end
>>>> | 4> %pointcharges
>>>> "/gpfs/work/IscrC_agelA/ASIC/QMMM/whole_npt15/0/qmmm_0.input.pntchrg"
>>>> ........................................
>>>>
>>>> * Energy+Gradient Calculation *
>>>> .........................................
>>>> Primary job terminated normally, but 1 process returned
>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>> -------------------------------------------------------
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun detected that one or more processes exited with non-zero status,
>>>> thus causing
>>>> the job to be terminated. The first process to do so was:
>>>>
>>>> Process name: [[13485,1],0]
>>>> Exit code: 127
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> ORCA finished by error termination in GTOInt
>>>> Calling Command: mpirun -np 34
>>>> /cineca/prod/opt/applications/orca/4.2.1/binary/bin/orca_gtoint_mpi
>>>> /gpfs/work/IscrC_agelA/ASIC/QMMM/whole_npt15/0/qmmm_0.input.int.tmp
>>>> /gpfs/work/IscrC_agelA/ASIC/QMMM/whole_npt15/0/qmmm_0.input
>>>> [file orca_tools/qcmsg.cpp, line 458]:
>>>> .... aborting the run
>>>> ........................................
>>>> .........................................
>>>>
>>>> MPI mismatch or what else? I would be grateful for any hint
>>>>
>>>> francesco pietra
>>>>
>>>>
>>>>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:14 CST