App Engine Cookbook: On-demand Cron Jobs

Today's post is, by necessity, a brief one. I'm travelling to San Francisco for I/O at the moment, and my flight was delayed so much I missed my connection in Atlanta and had to stay the night; in fact, I'm writing and posting this from the plane, using the onboard WiFi!

In a previous post, I introduced a recipe for high concurrency counters, which used a technique that I believe deserves its own post, since it's a useful pattern on its own. That technique is what I'm calling "On-demand Cron Jobs"

It's not at all uncommon for apps to have a need to do periodic updates at intervals, where the individual updates are small, and may even shift in time. One example is deleting or modifying any entry that hasn't been modified in the last day. In apps that need to do this, it's not uncommon to see a cron job like the following:

- description: Clean up old data
  url: /tasks/cleanup
  schedule: every 1 minute

This works, but it potentially consumes a significant amount of resources checking repeatedly if there's anything to clean up. Using the task queue, though, we can avoid the need to run all those unnecessary tasks.

What we do is devise a way of naming tasks such that there's exactly one valid name for our cleanup job for each 1 minute interval (or some other interval, if that suits your app better). When we do something, such as updating a record, that may require running a cleanup task, we attempt to enqueue a task with that name, ignoring any errors from name collisions. When the time comes, the task gets run, and operates in exactly the same fashion as the original minutely cron job. If something changed since we enqueued the job, no harm is done - it simply finds nothing to do - but we're still doing a lot less work than if we were running a job every single minute!

Here's one way to devise task names, stolen from the previous post:

def get_interval_number(ts, duration):
  """Returns the number of the current interval.

    ts: The timestamp to convert
    duration: The length of the interval
    int: Interval number.
  return int(time.mktime(ts.timetuple()) / duration)

And here's a function to enqueue the cleanup task:

def add_cron_on_demand(path, name, interval, when):
  """Enqueues an on-demand-cron job.

    path: The path to the cron job handler
    name: The name of the cron job handler
    interval: How often the handler should run
    when: When to run the handler.
  interval_num = get_interval_number(when, interval)
  task_name = '-'.join([name, interval, interval_num])
      taskqueue.add(path, eta=when, method='GET', name=task_name, url=path)
    except (taskqueue.TaskAlreadyExistsError, taskqueue.TombstonedTaskError):

Has this recipe helped you reduce the overhead on your own batch updates? Tell us about it in the comments!


blog comments powered by Disqus