Checkpoint on IRIX of namd 2.5 job not MPI enabled

From: Dow_Hurst (dhurst_at_mindspring.com)
Date: Thu Apr 07 2005 - 09:40:55 CDT

Is it possible to checkpoint as root a namd job on IRIX successfully? This job is not MPI enabled but is using 4 proc.

IRIX 6.5.23f

If the job is using network sockets to communicate over UDP then I don't think I can checkpoint it.

>From the cpr man page section on what works and what doesn't:
The following system objects are checkpoint-safe:

     o UNIX processes, process groups, terminal control sessions, IRIX
          array sessions, process hierarchies, sproc(2) groups, POSIX pthreads
          (pthread_create(3P)), random process sets, and IRIX jobs

     o all user memory area, including user stack and data regions

     o system states, including process and user information, signal
          disposition and signal mask, scheduling information, owner
          credentials, accounting data, resource limits, current directory,
          root directory, locked memory, and user semaphores

     o system calls, if applications handle return values and error numbers
          correctly, although slow system calls may return partial results

     o undelivered and queued signals are saved at checkpoint and delivered
          at restart

     o open files (including NFS-mounted files), mapped files, file locks,
          and inherited file descriptors

     o special files /dev/tty, /dev/console, /dev/zero, /dev/null,
          ccsync(7M)

     o open pipes, pipeline data and streams pipe read and write message
          modes

     o System V shared memory

     o POSIX semaphores (psema(D3X))

     o semaphore and lock arenas (usinit(3P))

     o jobs started with CHALLENGEarray services, provided they have a
          unique ASH number; see array_services(5)

     o applications using node-lock licenses; see IRIX Checkpoint and
          Restart Operation Guide on what to do for applications using
          floating licenses

     o applications using the prctl() PR_ATTACHADDR option; see prctl(2)

     o applications using blockproc and unblockproc; see blockproc(2)

     o R10000 counters; see libperfex(3C) and perfex(1)

     o capabilities, Mandatory Access Control (MAC) labels, and Access
          Control Lists (ACLs); see capabilities(4), DOMINANCE(5) and acl(4),
          respectively

The following system objects are not checkpoint-safe:

     o network socket connections; see socket(2)

     o X terminals and X11 client sessions

     o special devices such as tape drivers and CDROM

     o files opened with setuid credential that cannot be reestablished

     o System V semaphores and messages; see semop(2) and msgop(2)

     o memory mapped files using the /dev/mmem file; see mmap(2)

     o open directories

Thanks,
Dow

No sig.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:39 CST