New in 1.3.6: Namespaces

Posted by Nick Johnson | Filed under python, app-engine, coding, namespaces

The recently released 1.3.6 update for App Engine introduces a number of exciting new features, including multi-tenancy - the ability to shard your app for multiple independent user groups - using a new Namespaces API. Today, we'll take a look at the Namespaces API and how it works.

One common question from people designing multi-tenant apps is how to charge users based on usage. While I'd normally recommend a simpler charging model, such as per user, that isn't universally applicable, and even when it is, it can be useful to keep track on just how much quota each tenant is consuming. Since multi-tenant apps just got a whole pile easier, we'll use this as an opportunity to explore per-tenant accounting options, too.

First up, let's take a look at the basic setup for namespacing. You can check out this demo for an example of what a fully featured, configurable namespace setup looks like, but presuming we want to use domain names as our namespaces, here's the simplest possible setup:

def namespace_manager_default_namespace_for_request():
  import os
  return os.environ['SERVER_NAME']

That's all there is to it. If we wanted to switch on Google Apps domain instead, we could do this:

def namespace_manager_default_namespace_for_request():
  from google.appengine.api import namespace_manager
  return namespace_manager.google_apps_namespace()

Edit: Note that this function is broken in 1.3.6, unfortunately.

With that, presto, your app now handles multi-tenancy. On each request, your configuration function is called to determine the current namespace, and that namespace is used to segment calls to the other APIs for the rest of the request.

Of course, your app may not be that simple - it may need to access data that isn't namespace-specific, or even access data across namespaces. In which case, the set_namespace function, and recipes like this are your friend.

Moving on to accounting, we'll need a way to record the resources consumed on each request. For that, we'll need to write some middleware, so that we can run before and after each request. Here's how we'll start:

class AccountingMiddleware(object):
  CURRENT_INSTANCE = None

  def __init__(self, application):
    self.application = application

  def __call__(self, environ, start_response):
    self.counters = {}
    self.disabled = False
    AccountingMiddleware.CURRENT_INSTANCE = self
    try:
      ret = list(self.application(environ, start_response))
      self.update_counters(environ, ret)
      self.store_counters()
      return ret
    finally:
      AccountingMiddleware.CURRENT_INSTANCE = None

Our middleware in this case is a callable class. The call method initializes a dict of counters, representing values we're accounting for, then calls the original application, saving the response. On some WSGI platforms, materializing the response like this would be a problem, but since we know App Engine receives the entire response before sending anything to the client, this should be fine. Then, we call a to-be-defined method, update_counters(), and another, store_counters(), to store it. Let's take a look at store_counters first:

  def store_counters(self):
    if self.disabled: return
    memcache.offset_multi(self.counters, key_prefix='quota:', initial_value=0)
    if memcache.add("quota:_lock", 1):
      defer(write_counters, self.counters.keys())

  def disable_accounting(self):
    """Disables accounting for this request."""
    self.disabled = True

Here's where our first tradeoff becomes apparrent. We need to account for the quota usage of every request, but we don't want to impose too much overhead doing so, by doing something like writing a datastore record for every request. We also can't risk datastore contention by performing a transaction on the quota record for every update.

Fortunately, there's a lightweight way to do this. While we would like our accounting to be as accurate as possible, the slight risk of under-accounting is probably acceptable, so we can use memcache. In the routine above, we do two things: We use the offset_multi function to update the counters for each of the values we've recorded, and we use memcache.add to atomically add a lock record. If the lock record did not already exist, we start off a task to write the counters to the datastore. With a setup like this, every request will record its own usage, and then add a task queue task if one isn't already in progress. If we want to reduce overhead further, at the cost of more risk of under-accounting, we could add a delay to the new task, enforcing a maximum rate at which we will write out quota updates.

It's also worth noting that none of the methods here have an explicit namespace specified. This is because the namespace facility automatically handles this for us. Let's see how the write_counters function works:

def write_counters(counter_names):
  """Writes counters from memcache to the datastore."""
  AccountingMiddleware.CURRENT_INSTANCE.disable_accounting()
  # Get the amounts to increment
  counters = memcache.get_multi(counter_names, key_prefix='quota:')
  counters = dict((k, int(v)) for k, v in counters.items())
  # Subtract the retrieved amounts from the counters
  memcache.offset_multi(dict((k, -v) for k, v in counters.items()),
                        key_prefix='quota:')
  # Remove the lock
  memcache.delete("quota:_lock")
  # Update the datastore
  QuotaUsage.write_quotas(datetime.date.today(), counters)

This is a little more complicated than anything we've seen so far, so let's go through it step by step. The order of operations here is important - otherwise we can introduce undesirable race conditions. The first thing we do is disable accounting for this request - otherwise, we'd create an infinite loop of task queue requests, each recording the quota used by the previous update! Then, we fetch the current values of the counters from memcache, so we know what to write. Next, we subtract the values we just received from the same counters. We have to do this, rather than deleting them, because the values could've been updated by another request between us fetching them and deleting them. Then, we remove the memcache lock, allowing another task to be enqueued, and finally we run the QuotaUsage.write_quotas method, which performs the actual transaction. Let's see that now.

class QuotaUsage(db.Model):
  """Records quota usage for a given namespace in a given time interval."""
  cpu = db.IntegerProperty(required=True, default=0)
  api_cpu = db.IntegerProperty(required=True, default=0)
  data_in = db.IntegerProperty(required=True, default=0)
  data_out = db.IntegerProperty(required=True, default=0)
  
  @classmethod
  def write_quotas(cls, date, quota_dict):
    """Increments quotas for the specified time interval.
    
    Args:
      date: A datetime.Date object for the date to increment quotas for.
      quotas: A dict mapping quota names to quantities to increment.
    """
    def _tx():
      key_name = date.isoformat()
      quotas = cls.get_by_key_name(key_name)
      if not quotas:
        quotas = cls(key_name=key_name)
      for k, v in quota_dict.items():
        setattr(quotas, k, getattr(quotas, k) + v)
      quotas.put()
    db.run_in_transaction(_tx)

This code, at least, should be fairly straightforward - it's a simple read, modify, write operation. The only complication is that we're making it data-driven, iterating over the values in the dict to determine which fields we update. Also note that we name the entity after the current date, neatly assuring we have daily accounting.

Finally, let's see update_counters, which is fairly straightforward:

  def update_counters(self, environ, ret):
    self.increment_counter('cpu', quota.get_request_cpu_usage())
    self.increment_counter('api_cpu', quota.get_request_api_cpu_usage())
    self.increment_counter('data_in', int(environ.get('CONTENT_LENGTH') or 0))
    self.increment_counter('data_out', sum(len(x) for x in ret))

  def increment_counter(self, name, value):
    self.counters[name] = self.counters.get(name, 0) + value

Here, we use a couple of sources of data for our quota values: the request and response, which we extract the incoming and outgoing bandwidth from (or rather, a lower bound on those values), and the quota API, which we use to get the CPU time used for both API calls and regular CPU usage.

Here's the magic appengine_config invocation to add our middleware:

def webapp_add_wsgi_middleware(app):
  import accounting
  app = accounting.AccountingMiddleware(app)
  return app

And that's all there is to it! Questions? Comments? Post them below!

27 August, 2010

Previous Post Next Post

Nick's Blog

New in 1.3.6: Namespaces

Comments

Blogroll