Summary  
This chapter covers how to implement a watchdog mechanism for error handling and automatic self-recovery in long-running processes, ensuring crashes trigger restarts and resource cleanup.

General domain of usage  
High-availability background services

When you build a daemon, it is crucial to anticipate and handle failures gracefully. **Common failure modes for daemons** include unexpected crashes due to programming errors, resource exhaustion (like running out of memory or file descriptors), unhandled exceptions, or external factors such as missing files or lost network connectivity. If a daemon process simply exits on error, it can leave critical services unavailable or data in an inconsistent state. For this reason, robust daemons often incorporate **self-recovery mechanisms**. These mechanisms automatically restart the daemon after a crash, clean up resources, and attempt to restore normal operation. By doing so, you reduce downtime and improve reliability, which is especially important for background services that users and other systems depend on.

Ly8gU2ltcGxlIEMgd2F0Y2hkb2cgdGhhdCByZXN0YXJ0cyBhIGRhZW1vbiBwcm9jZXNzIGlmIGl0IGV4aXRzIGFibm9ybWFsbHkKI2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgojaW5jbHVkZSA8c3lzL3R5cGVzLmg+CiNpbmNsdWRlIDxzeXMvd2FpdC5oPgoKaW50IG1haW4oKSB7CiAgICBwaWRfdCBwaWQ7CiAgICBpbnQgc3RhdHVzOwogICAgd2hpbGUgKDEpIHsKICAgICAgICBwaWQgPSBmb3JrKCk7CiAgICAgICAgaWYgKHBpZCA9PSAwKSB7CiAgICAgICAgICAgIC8vIENoaWxkIHByb2Nlc3M6IHlvdXIgZGFlbW9uIGxvZ2ljIGdvZXMgaGVyZQogICAgICAgICAgICAvLyBGb3IgZGVtb25zdHJhdGlvbiwgZXhpdCB3aXRoIGVycm9yIGFmdGVyIHNvbWUgdGltZQogICAgICAgICAgICBwcmludGYoIkRhZW1vbiBydW5uaW5nIChwaWQ6ICVkKVxuIiwgZ2V0cGlkKCkpOwogICAgICAgICAgICBzbGVlcCg1KTsKICAgICAgICAgICAgZXhpdCgxKTsgLy8gU2ltdWxhdGUgYSBjcmFzaAogICAgICAgIH0gZWxzZSBpZiAocGlkID4gMCkgewogICAgICAgICAgICAvLyBQYXJlbnQgcHJvY2Vzczogd2F0Y2hkb2cKICAgICAgICAgICAgd2FpdHBpZChwaWQsICZzdGF0dXMsIDApOwogICAgICAgICAgICBpZiAoV0lGRVhJVEVEKHN0YXR1cykgJiYgV0VYSVRTVEFUVVMoc3RhdHVzKSA9PSAwKSB7CiAgICAgICAgICAgICAgICBwcmludGYoIkRhZW1vbiBleGl0ZWQgbm9ybWFsbHkuIFdhdGNoZG9nIHN0b3BwaW5nLlxuIik7CiAgICAgICAgICAgICAgICBicmVhazsKICAgICAgICAgICAgfSBlbHNlIHsKICAgICAgICAgICAgICAgIHByaW50ZigiRGFlbW9uIGNyYXNoZWQgb3IgZXhpdGVkIGFibm9ybWFsbHkuIFJlc3RhcnRpbmcuLi5cbiIpOwogICAgICAgICAgICB9CiAgICAgICAgfSBlbHNlIHsKICAgICAgICAgICAgcGVycm9yKCJmb3JrIik7CiAgICAgICAgICAgIGV4aXQoMSk7CiAgICAgICAgfQogICAgfQogICAgcmV0dXJuIDA7Cn0=

IyBTaW1wbGUgUHl0aG9uIHdhdGNoZG9nOiBwYXJlbnQgcHJvY2VzcyByZXN0YXJ0cyBjaGlsZCBkYWVtb24gb24gZmFpbHVyZQppbXBvcnQgb3MKaW1wb3J0IHN5cwppbXBvcnQgdGltZQoKZGVmIGRhZW1vbl9sb2dpYygpOgogICAgcHJpbnQoZiJEYWVtb24gcnVubmluZyAocGlkOiB7b3MuZ2V0cGlkKCl9KSIpCiAgICB0aW1lLnNsZWVwKDUpCiAgICByYWlzZSBFeGNlcHRpb24oIlNpbXVsYXRlZCBjcmFzaCIpCgppZiBfX25hbWVfXyA9PSAiX19tYWluX18iOgogICAgd2hpbGUgVHJ1ZToKICAgICAgICBwaWQgPSBvcy5mb3JrKCkKICAgICAgICBpZiBwaWQgPT0gMDoKICAgICAgICAgICAgIyBDaGlsZCBwcm9jZXNzOiBydW4gZGFlbW9uIGxvZ2ljIHdpdGggZXJyb3IgaGFuZGxpbmcKICAgICAgICAgICAgdHJ5OgogICAgICAgICAgICAgICAgZGFlbW9uX2xvZ2ljKCkKICAgICAgICAgICAgICAgIHN5cy5leGl0KDApCiAgICAgICAgICAgIGV4Y2VwdCBFeGNlcHRpb24gYXMgZToKICAgICAgICAgICAgICAgIHByaW50KGYiRGFlbW9uIGNyYXNoZWQ6IHtlfSIpCiAgICAgICAgICAgICAgICBzeXMuZXhpdCgxKQogICAgICAgIGVsc2U6CiAgICAgICAgICAgICMgUGFyZW50IHByb2Nlc3M6IHdhdGNoZG9nCiAgICAgICAgICAgIF8sIHN0YXR1cyA9IG9zLndhaXRwaWQocGlkLCAwKQogICAgICAgICAgICBpZiBvcy5XSUZFWElURUQoc3RhdHVzKSBhbmQgb3MuV0VYSVRTVEFUVVMoc3RhdHVzKSA9PSAwOgogICAgICAgICAgICAgICAgcHJpbnQoIkRhZW1vbiBleGl0ZWQgbm9ybWFsbHkuIFdhdGNoZG9nIHN0b3BwaW5nLiIpCiAgICAgICAgICAgICAgICBicmVhawogICAgICAgICAgICBlbHNlOgogICAgICAgICAgICAgICAgcHJpbnQoIkRhZW1vbiBjcmFzaGVkIG9yIGV4aXRlZCBhYm5vcm1hbGx5LiBSZXN0YXJ0aW5nLi4uIik=

Both the **C** and **Python** examples above implement a basic **watchdog mechanism**. The parent process acts as a monitor: it forks a child process to run the daemon code, waits for the child to exit, and checks the exit status. If the child exits normally (with status `0`), the watchdog also exits, assuming the work is done. If the child crashes or exits with an error, the watchdog restarts it, ensuring the daemon remains running. This pattern is especially useful for critical background services that must be resilient to unexpected failures. **Watchdog logic** is best used when you require high availability and cannot risk the daemon remaining down after a crash. However, it is important to avoid endless restart loops in the case of persistent errors, so you may want to add limits or backoff strategies in production systems.

What is the purpose of a watchdog in daemon design?

A hands-on course for programmers to master Linux daemons: learn how background services work, build your own daemon in C or Python, and manage it using modern Linux tools. Each chapter features practical implementation and real-world system behavior.

Explore the fundamentals of Linux daemons, their lifecycle, and the system mechanisms that make background services possible.

Step-by-step construction of a robust custom daemon, covering all essential features for real-world operation.

Learn how to integrate, control, and debug your custom daemon using modern Linux service management tools.

Error Handling and Self-Recovery