banner

Kernel Level Checkpointing for SGI Altix

We are working on kernel level checkpointing for SGI Altix systems. The checkpoint mechanism is being developed from the scratch. The tools are tested on the SGI Altix Server running a single Linux OS with four Intel Itanium2 processors.
The package is developed as a part of SGIgrid project.

The package consists of the command line binaries and the kernel module. To enhance operating system with checkpointing functionality the administrator must load kernel module. The 'ckpnt' application was designed to store process' image and the 'resume' application is used to restores it.

Package version 1.0a is ready

We have prepared the next version of the package which has new feature. The package has implemented following functionality:

  • binaries for kernel 2.6.16.46-0.12-default - SLES 10 + PP5 installation. You can get the binaries from here.
  • binaries for kernel 2.6.16.21-0.8-default "clean" SLES 10 installation. You can get the binaries from here.
  • tested with multithreaded Gaussian03 application
  • works with Torque and PBS pro , new switches added
    • -drp (Do not Restore Pipes)
    • -cwd (Change Working Directory)
    you can learn more how to use checkpointing with torque or with PBS
  • fixed multiple bugs
  • support for kernel 2.6.5-7.252-sn2
  • support for multithreaded applications
  • ported to the SUSE LINUX Enterprise Server 9 with SGI ProPack 4 for Linux
  • virtualization of identifiers of resources
  • support for multi process applications
  • emultation of "zombie" processes
  • support for System V IPC
    • sempahores
    • messages queues
    • shared memory
  • saves and restores all data (registers, memory segments) needed for reliable work
  • supports system calls
  • supports environment variables
  • stores files used by application and files indicated by the user
  • supports descriptors incontinuity
  • supports open files, offset in the file
  • support for interactive application
  • supports special files (STDIN, STDOUT, STDERR, /dev/null, /dev/zero, /dev/random)

The virtualization mechanizm was presented at ICCS 2004. The article "Resource Virtualization in Fault Tolerance and Migration Isues." is available in the part III, page 449 of the proceedings http://www.springeronline.com.

The presentation "Kernel level checkpoint-restart mechanism for Linux on IA64" published at CGW 04.

The presentation "Kernel Level Checkpoint Restar Functionality for SGI Altix systems" published at SGIUG 2005 Conference.


Download

Before downloading, please read the Copyright and Licence below and please register.

New!Package version 1.0a
(It is designed for and tested with SuSE Linux Enterprice Server 10 - kernel 2.6.16.21-0.8)

Package version 1.0a
(It is designed for and tested with SGI ProPack 4.0 and SuSE Linux Enterprice Server 9 SP 3 - kernel 2.6.5-7.252-sn2)

Package version 0.8
(It is designed for and tested with SGI ProPack 4.0 and SuSE Linux Enterprice Server 9 SP 3 - kernel 2.6.5-7.252-sn2)

Package version 0.71
(It is designed for and tested with SGI ProPack 4.0 and SuSE Linux Enterprice Server 9 SP 3 - kernel 2.6.5-7.252-sn2)

Package version 0.7
(It is designed for and tested with SGI ProPack 4.0 and SuSE Linux Enterprice Server 9 SP 2)

Release candidate 0.6
(It is designed for and tested with SGI ProPack 4.0 and SuSE Linux Enterprice Server 9 SP 1)

Release candidate 0.5 (It is working only with SGI ProPack 3.0)

Release candidate 0.4 (It is working only with SGI ProPack 2.1 and 2.3)

Release candidate 0.1 (It is working only with SGI ProPack 2.1)


If you have found any bugs please contact with us.