Using Python magic to improve the deferred API

Posted by Nick Johnson | Filed under python, deferred, app-engine, coding, celery

Recently, my attention was drawn, via a blog post to a Python task queue implementation called Celery. The object of my interest was not so much Celery itself - though it does look both interesting and well written - but the syntax it uses for tasks.

While App Engine's deferred library takes the 'higher level function' approach - that is, you pass your function and its arguments to the 'defer' function - I've never been entirely happy with that approach. Celery, in contrast, uses Python's support for decorators (one of my favorite language features) to create what, in my view, is a much neater and more flexible interface. While defining and calling a deferred function looks like this:

def my_task_func(some_arg):
  # do something

defer(my_task_func, 123)

Doing the same in Celery looks like this:

@task
def my_task_func(some_arg):
  # do something

my_task_func.delay(123)

Using a decorator, Celery is able to modify the function it's decorating such that you can now call it on the task queue using a much more intuitive syntax, with the function's original calling convention preserved. Let's take a look at how this works, first, and then explore how we might make use of it in the deferred library.

Functions as objects in Python

In Python, everything is an object. That includes everything from what other languages call 'primitive' values like the number 3, all the way up to and including things such as functions, classes, and modules. This has many implications for the design of the language and what you can do with it. You probably think of functions as things which you can only do one thing to - call them - but because every function in Python is also an object, that's not the case: a Python function is simply a 'callable' object. We can explore the members of a function in Python using the dir() builtin:

>>> def my_func(arg1, arg2=None):
...   return arg2 or arg1
... 
>>> type(my_func)
<type 'function'>
>>> dir(my_func)
['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__', '__getattribute__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

As you can see, my_func is an instance of type 'function', and it has a number of members, including __call__, the special method that makes an object callable. It also includes a bunch of standard members, such as __hash__, __class__, and so forth, and a few function-specific ones, which we'll come back to later.

Celery takes advantage of this by modifying the function, adding a new member called 'delay'. You can modify functions in this way just by setting the member as you would anything else:

>>> my_func.foo = 'bar'
>>> my_func.foo
'bar'

Based on this, we could define a decorator for the deferred library trivially:

def task(func):
  task.defer = lambda *args, **kwargs: defer(func, *args, **kwargs)
  return task

And indeed, this will work exactly as expected, allowing us to call my_func.defer(args) instead of defer(my_func, args). There's a couple of missed opportunities, however. First up, you can call this new function with any set of arguments, and it will happily try and defer a call to the function with those arguments. If you passed an invalid set of arguments - for example, passing 4 arguments to a function that expects only 3 - you won't know about it until your task runs, and fails, making tracking down the culprit a lot harder.

There are a couple of solutions for this. One is to use inspect.getargspec to retrieve the argument specification from the original function, and check it against the passed in arguments. This will impose some additional overhead, however, and require us to write our own argument checking code, which will be difficult to test for correctness.

The second approach is slightly kludgier, but provides a better result: Dynamically create a new function and evaluate it using eval(). The function we create can have the same argument specification as the original function, and call our more permissive function as its sole activity. Fortunately, someone's already done this for us, in the form of the decorator library. With the decorator library, our new signature-preserving defer decorator looks like this:

def task(func):
  func.defer = decorator(defer, func)

Not only is it signature preserving, but it's even simpler! Note we didn't have to define a function at all: the decorator function expects its first argument to be a function with the signature func(f, *args, **kwargs), which is exactly the signature of the existing deferred.defer function. Given the decorator function and the function being decorated, decorator returns the decorated function, which we add as the 'defer' method of the original function, instead of replacing it.

The second feature I'd like to take advantage of is the ability to provide arguments to the task queue service, such as ETA and task name, both at definition time and at runtime. The existing defer function handles this using reserved, underscore-prefixed arguments for these parameters, but this seems less than ideal.

If we follow the pattern used by Celery, the former is easy to achieve: we can have our 'task' decorator take arguments, which will serve as defaults for anything that calls 'defer' on the decorated function. This only handles specification at definition-time, though, and we'd like to be able to specify them when we call our deferred function, too. We've got several, possibly conflicting, goals here:

Make basic invocation as simple as possible. myfunc.defer(some_args) should continue to work as expected.
Make it possible to specify arguments for the task queue at runtime, without modifying the signature of the function we're deferring a call to.
Make it possible to bulk-add deferred tasks, rather than having them implicitly added individually, without complicating the interface for goals 1 and 2.

Satisfying all of these may be a tall order. I'm actually not entirely certain what the best way to handle this will be, yet. Currently, I think the best option may be this:

Define a 'DeferredTaskTemplate' abstract class that encapsulates the arguments to be used in creating a deferred task. Instances of the class are immutable. The class has a method called something like 'with' or 'options', which when called with task queue options returns a new instance of itself with those options modified.
Have the task decorator create a new subclass of DeferredTaskTemplate with the __call_ method defined using the decorator trick we discussed above. When called, it returns the taskqueue.Task object for the new deferred task, after optionally enqueueing it.
Add another keyword argument to the with/options method, 'add', which causes the class to not enqueue the task if it's set to True.

With a framework such as that, the calling conventions look like this:

# Regular call
my_func(a, b)

# Standard deferred call
my_func.defer(a, b)

# Deferred call with task name and countdown
my_func.defer.with(name='foo', countdown=60)(a)

# Deferred calls using batch interface:
tasks = [my_func.defer.with(add=False)(x) for x in l]
taskqueue.Queue('test').add(tasks)

# Shortcut for commonly used options
gone_in_60s = my_func.defer.with(countdown=60)
gone_in_60s(a, b)

This seems like a reasonable interface to me. What do you think? Do you have any ideas on how to improve the interface further?

25 June, 2010

Previous Post Next Post

Nick's Blog

Using Python magic to improve the deferred API

Functions as objects in Python

Comments

Blogroll