AetherSystems Distinguished Lecture Series in Mobile and Wireless Systems

AetherSystems Distinguished Lecture Series in Mobile and Wireless Systems

Fast, Low-Cost Checkpointing and Recovery Techniques
for
Mobile Computing Systems

Mukesh Singhal
National Science Foundation
and
The Ohio State University

T.B.A.

AetherSystems Distinguished Lecture Series in Mobile and Wireless Systems

Lecture Abstract

Checkpointing and failure recovery techniques that have low overhead and provide fast recovery from failures are integral to the design of fault-tolerant, high-performance mobile computing systems. This talk will present a new approach called the quasi-synchronous checkpointing and failure recovery for mobile computing systems. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpointing for progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has easeness and low overhead of asynchronous checkpointing and recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process at all time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process only needs to rollback to its latest checkpoint and request other processes to roll back to a consistent checkpoint. To avoid domino effect altogether, selective pessimistic message logging at the receiver end is used.

Author's Biography

T.B.A.

References

T.B.A.
T.B.A.

For more information contact Dr. Anupam Joshi via e-mail at joshi@csee.umbc.edu or by phone at (410) 455 2590.