Nick's Blog

Merry Season!

Posted by Nick Johnson | Filed under app-engine

As you've probably guessed by now, I'm not posting over the holiday period. Expect new content, including an all new series of posts, in the new year, however!

In the meantime, enjoy your time with your families, if that's what you're doing. As for myself and my wife, Hayley, we're back in New Zealand, enjoying some quality time with our families.

Read more | Comments | 24 December, 2009

OpenID on App Engine made easy with AEoid

Posted by Nick Johnson | Filed under python, coding, app-engine, tech

I'm pleased to present AEoid, a new App Engine library that aims to make user authentication with OpenID on App Engine simple.

AEoid is extremely easy to install, comprising a single piece of WSGI middleware, and its interface mirrors that of the App Engine Users API. Here's an example of it in action:

from aeoid import users

class TestData(db.Model):
  user = users.UserProperty()

class TestHandler(webapp.RequestHandler):
  def get(self):
    user = users.get_current_user()
    if not user:
      self.redirect(users.create_login_url(self.request.url))
      return
    logging.warn("Logged in as %s (%s)", user.nickname(), user.user_id())
    data = TestData(user=user)
    data.put()

As you can see, the interface to AEoid is almost exactly identical to the App Engine Users API. There a few differences of note:

Users are identified uniquely by their OpenID endpoint.
You can't construct a User object without specifying an OpenID URL.
Nicknames and email addresses are user-supplied, so they're not guaranteed unique or validated.
is_current_user_admin() is not yet implemented.
login: clauses in app.yaml are not affected by AEoid - they still authenticate using the regular Users API.

Installation

Installing AEoid is a simple matter of adding its WSGI middleware to your app ...

Read more | Comments | 18 December, 2009

No post today

Posted by Nick Johnson | Filed under app-engine, tech

Unfortunately, I've been caught short today by a flood of important things at work, and didn't get a chance to write up my regular Wednesday blog post. In place of that, I offer you the winners of the App Engine USB drives:

Jeff
Rodrigo Moraes
Olivier Deckmyn

If you are reading this, and you're one of the winners, please send me an email (nick AT notdot DOT net) with your name, address, and favorite Google color (Yellow, Green, Red, or Blue), and I'll get them out to you pronto.

As further tribute, I offer you this adorable picture of a kitten:

(Yes, I'm aware of how paradoxical the subject of this post is)

Read more | Comments | 16 December, 2009

Damn Cool Algorithms: Log structured storage

Posted by Nick Johnson | Filed under tech, damn-cool-algorithms

Typically, if you're designing a storage system - such as a filesystem, or a database - one of your major concerns is how to store the data on disk. You have to take care of allocating space for the objects to be stored, as well as storing the indexing data; you have to worry about what happens when you want to extend an existing object (eg, appending to a file), and you have to take care of fragmentation, which happens when old objects are deleted, and new ones take their place. All of this adds up to a lot of complexity, and the solutions are often buggy or inefficient.

Log structured storage is a technique that takes care of all of these issues. It originated as Log Structured File Systems in the 1980s, but more recently it's seeing increasing use as a way to structure storage in database engines. In its original filesystem application, it suffers from some shortcomings that have precluded widespread adoption, but as we'll see, these are less of an issue for database engines, and Log Structured storage brings additional advantages for a database engine over and above easier storage management.

The basic organization of a ...

Read more | Comments | 14 December, 2009

'Naked' domains on App Engine

Posted by Nick Johnson | Filed under naked-domains, app-engine, tech, dns

One topic that comes up frequently on the App Engine groups is that of 'naked' domains in App Engine, and how to handle them. A naked domain, for the uninitiated, is one without a service-specific subdomain. For example, "google.com" is naked, while "www.google.com" is not. This post provides an overview of why naked domains are a problem, and what you can do about them.

There are two separate factors that combine to make handling of naked domains a problem in App Engine. The first is the design of DNS, the system for resolving domain names to IP addresses. There are two different types of DNS record we're concerned about here: A records, which specify the IP address for a name, and CNAME records, which acts a "see also", specifying another name for a domain. For example, an A record might say "google.com has the IP 216.239.59.104", while a CNAME record might say "google.com is also known as www.l.google.com".

The problem arises with the way CNAME records work. An A record specifies the IP address only for a single record - for example, an A record on google.com specifies ...

Read more | Comments | 11 December, 2009

'Most popular' metrics in App Engine

Posted by Nick Johnson | Filed under python, tech, app-engine, coding, datastore

One useful and common statistic to provide to users is a metric akin to "7 day downloads" or "7 day popularity". This appears across many sites and types of webapp, yet the best way to do this is far from obvious.

The naive approach is to record each download individually, and use something akin to "SELECT count(*) FROM downloads WHERE item_id = 123 AND download_date > seven_days_ago", but this involves counting each download individually - O(n) work with the number of downloads! Caching the count is an option, but still leads to excessive amounts of work at read-time.

Another option is to maintain an array of daily download counts, keeping the last 7. This is an improvement from a workload point of view, but leads to either discontinuities at the start of a new day, or to all counts being updated only once per day.

There is a third option, however, which has the performance of the second option, with the responsiveness of the first. To use it, however, we have to reconsider slightly what we mean by '7 day popularity'. The solution in question is to use an exponential decay process. Each time an event happens, we increase the item's ...

Read more | Comments | 09 December, 2009

DIY USB preloading with *nix

Posted by Nick Johnson | Filed under go, coding, tech

Having recently received a large number of USB flash drives, I needed a solution for preloading them in bulk. Dedicated USB preloading/flashing devices are pricey - starting at over 500 euro for a small model - and while the preload services most companies offer (including Memotrek, the company we ordered the drives from) are handy, they add an extra 50c or so to the price of each drive, and the preload is quickly out of date. With that in mind, I decided to go the DIY route. This post documents my attempts and the final (successful) result.

To start, you need a lot of USB ports. I purchased two D-Link DUB-H7 7 port USB hubs, but any hubs ought to do, as long as the spacing between the ports is sufficient to accommodate a flash drive in every port. You won't need the included power bricks, as the power provided by the USB host is sufficient even for 7 UDB flash drives.

The general process of bulk flashing goes something like this:

Plug in one of your drives. Wipe it with "dd if=/dev/zero of=/dev/your-drive bs=1M", partition and format it, and write the data you want ...

Read more | Comments | 07 December, 2009

.astronomy wrap-up

Posted by Nick Johnson | Filed under tech, dotastronomy

Wednesday was Hack Day at dot astronomy. I spent the day working on a tool that uses seadragon ajax and a modified Python tilecutter to allow people with large astronomical images (tens to hundreds of megapixels) to easily upload them to App Engine for viewing by users. This is useful because many really attractive astronomical images get released to the public, but often only in two versions: 'desktop wallpaper' and 'too big to view'. Ideally, with this tool (which I'm tentatively calling astrozoom), astronomers could make it easy for users to view and zoom the product of their work.

Further extensions would include integration with astrometry.net to automatically locate and annotate uploaded images, and support for clipping out and downloading certain sections of an image, not to mention community features like sharing with friends, comments, and embedding in other pages.

I got the basic upload-and-display functionality done on hack day, but due to lack of memory on my mac to run the tool on a decently sized image, I'm unable to show it off yet.

Other hack-day projects included a large team working on a project called Buried Data, for making datasets available for research that would ...

Read more | Comments | 04 December, 2009

.astronomy so far

Posted by Nick Johnson | Filed under app-engine, tech, dotastronomy

The first 3 days of .astronomy have been busy. So busy, I haven't had the time or energy to write about them until now! Here's a quick summary of what's happened:

I arrived at the conference a bit before 11AM on Monday, having taken the earliest flight available that day (I didn't fly in the night before, as that would've meant missing Video Games Live). When I came in, Robert Hollow was giving a talk on pulse@parkes, a fascinating program he runs to get students into astronomy by giving them real observing time on the Parkes radio telescope in Australia. He gave an engaging presentation, and made me wish I could give it a go myself.

Next up, Arfon Smith and Chris Lintott gave a talk on Galaxy Zoo, which has come a long way since I last looked at it. They described the architecture (Ruby, running on AWS), some of the project's successes, and some of the new projects they're working on, including Galaxy Zoo Mergers, a project dedicated to determining and documenting the details of galaxy mergers.

After lunch, I gave my Python 101 tutorial. Due to a screw-up on ...

Read more | Comments | 02 December, 2009

Installation

Blogroll