From: Dow_Hurst (dhurst_at_mindspring.com)
Date: Thu Apr 07 2005 - 09:40:55 CDT
Is it possible to checkpoint as root a namd job on IRIX successfully? This job is not MPI enabled but is using 4 proc.
IRIX 6.5.23f
If the job is using network sockets to communicate over UDP then I don't think I can checkpoint it.
>From the cpr man page section on what works and what doesn't:
The following system objects are checkpoint-safe:
o UNIX processes, process groups, terminal control sessions, IRIX
array sessions, process hierarchies, sproc(2) groups, POSIX pthreads
(pthread_create(3P)), random process sets, and IRIX jobs
o all user memory area, including user stack and data regions
o system states, including process and user information, signal
disposition and signal mask, scheduling information, owner
credentials, accounting data, resource limits, current directory,
root directory, locked memory, and user semaphores
o system calls, if applications handle return values and error numbers
correctly, although slow system calls may return partial results
o undelivered and queued signals are saved at checkpoint and delivered
at restart
o open files (including NFS-mounted files), mapped files, file locks,
and inherited file descriptors
o special files /dev/tty, /dev/console, /dev/zero, /dev/null,
ccsync(7M)
o open pipes, pipeline data and streams pipe read and write message
modes
o System V shared memory
o POSIX semaphores (psema(D3X))
o semaphore and lock arenas (usinit(3P))
o jobs started with CHALLENGEarray services, provided they have a
unique ASH number; see array_services(5)
o applications using node-lock licenses; see IRIX Checkpoint and
Restart Operation Guide on what to do for applications using
floating licenses
o applications using the prctl() PR_ATTACHADDR option; see prctl(2)
o applications using blockproc and unblockproc; see blockproc(2)
o R10000 counters; see libperfex(3C) and perfex(1)
o capabilities, Mandatory Access Control (MAC) labels, and Access
Control Lists (ACLs); see capabilities(4), DOMINANCE(5) and acl(4),
respectively
The following system objects are not checkpoint-safe:
o network socket connections; see socket(2)
o X terminals and X11 client sessions
o special devices such as tape drivers and CDROM
o files opened with setuid credential that cannot be reestablished
o System V semaphores and messages; see semop(2) and msgop(2)
o memory mapped files using the /dev/mmem file; see mmap(2)
o open directories
Thanks,
Dow
No sig.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:18 CST