Monitoring services, CPU, disk and memory usage has become a common practice for IT companies big and small, but what often overseen are Cron jobs. (If you are unfamiliar with Cron, it’s the standard *nix way of running periodical tasks.) In this short post I show one simple way just to get you going.
In one of the companies I work with at the moment, they have several web apps with many Cron jobs. Jobs do billing, notifying, report processing and many other tasks. It happened several times that critical tasks stopped running under some circumstances, but there was no way of knowing that until it’s too late and someone’s reporting a problem.
The simplest solution I could come up with was touching a certain file (changing that file’s access timestamp to current time) after the successful task execution. Every Cron job would have its own file. Then the last bit would be Monit looking at these files and sending alerts when any of them weren’t touched longer than the period of corresponding tasks.
Here’s a sample Rake task:
1 2 3 4 5 6 7 8 | |
After the billing task is complete, we touch the file log/job_billing. Given that we run this task daily, we can configure Monit to look at this file to see if it’s touched with these two lines:
1 2 | |
I give it extra hour just in case, but you can choose the period yourself.
One more tip I’d like to give is keeping the Monit configuration in the config/monit.rc.erb with the rest of the sources, and generating the actual monit.rc that to include into /etc/monitrc on the server with a Capistrano task:
1 2 3 4 5 6 7 | |
The tricky part here is to know the location of touched files that are inside the project directory hierarchy (yes, you can put them under /tmp, but I prefer to keep project-related stuff with the project). That’s why I’m using ERB templates and render them at the moment when the shared directory path on the remote server is known. Here’s the template:
1 2 | |
Hopefully this’ll make your products more stable!