When it comes to Bulk Loading, there's currently a shortage of tools for Java users, largely due to the relative newness of the Java platform for App Engine. Not all is doom and gloom, however: Users of App Engine for Java can use the Python bulkloader to load data into their Java App Engine instances! Because App Engine treats each major version of an app as a completely separate entity - but sharing the same datastore - it's possible to upload a Python version specifically for the purpose of Bulk Loading. This won't interfere with serving your Java app to your users.
To follow this guide, you won't need to understand much of the Python platform, though you will need to know a little Python. If your bulkloading needs are straightforward, you won't need to know much at all - it's essentially connect-the-dots - but if your bulkloading needs are a little more complex, you'll need to understand some of the basics of programming in Python - defining and calling functions and methods, basically. Who knows, you ...
This is the fifth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
The bulkloader automatically supports loading data from CSV files, but it's not restricted to that. With a little work, you can make the bulkloader load data from any datasource you can access with Python. The key to this is the generate_records method. This method accepts a filename, and is expected to yield a list of strings for each entity to be uploaded. By overriding this method, we can load from any datasource we wish - for example, a relational database such as MySQL. To make this reusable, let's define a base class we can use to load data from MySQL databases:
import MySQLdb class MySQLLoader(bulkloader.Loader): def __init__(self, kind_name, query, converters): self.query = query bulkloader.Loader.__init__(kind_name, converters) def initialize(self, filename, loader_opts) self.connect_args = dict(urlparse.parse_qsl(loader_opts)) def generate_records(self, filename): """Generates records from a MySQL database.""" db = MySQLdb.connect(self.connect_args) cursor = db.cursor() cursor.execute(self.query) return iter(cursor.fetchone, None)
What we've done here is extended the bulkloader's Loader class to load from a MySQL database instead ...
This is the fourth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
As promised, today we're going to discuss Distributed Transactions on App Engine. Distributed transactions are that feature that you didn't know you needed until they were gone: The ability to update any set of entities or records in your database in a single transaction. Due to the requirements of scale, App Engine supports transactions, but only on predefined Entity Groups. For most cases, this is all you need, but occasionally you really have to use a distributed or global transaction.
There are proposals for implementing distributed transactions on App Engine, but they're complicated, and none of them have yet been implemented as a robust library you can simply plug in and use. We're not going to attempt to recreate a general purpose distributed transaction system - at least, not today - instead, we'll address one common use-case for distributed transactions.
The usual example of the need for distributed or global transactions - so common that it's practically canonical - is the 'bank account' example. Suppose you have a set of accounts, defined something like this:
class Account ...
This is the third in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
Customizing Created Entities
Sometimes, it can be useful to modify an entity after it's created by the bulkloader, but before it's uploaded to App Engine. In our file upload example above, for instance, we might want to set the filesize field to the length of the uploaded file, but we don't want to add a new field to the data just to indicate that, since it's extra work, and could rapidly become inaccurate as files change.
Fortunately, the bulkloader provides an easy mechanism for this: The handle_entity method. By overriding this method, we can perform whatever postprocessing we wish on an entity before it's uploaded to the datastore. Here's an example that sets the filesize field:
class ImageLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'DatastoreImage', [('filename', str), ('data', file_loader) ]) def handle_entity(self, entity): entity.filesize = len(entity.data) return entity
As an added bonus, the handle_entity method is not restricted to returning a single entity: It may return a list of them. You can use this to generate additional 'computed ...
This is the second in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
The bulk loader facilitates getting data in and out of App Engine, but many people don't realise just how powerful it can be. In this and subsequent posts, we'll explore some of the more advanced things you can do with the bulk loader, including:
- Importing or exporting binary data
- Customizing created entities
- Loading from and exporting to relational databases
Custom Conversion Functions
The most straightforward way of using the bulkloader, as shown in the documentation, is to define a bulkloader.Loader subclass, and overload the __init__ function, supplying a list of converters for the fields in your input file. Here's the example from the documentation:
class AlbumLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'Album', [('title', str), ('artist', str), ('publication_date', lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date()), ('length_in_minutes', int) ])
Most of the converters look like declarations - title is a str(ing), as is artist; length_in_minutes is an int. Publication_date is an odd one out, though, and gives us a hint of the real power behind converters: They can be any function ...
This is the first in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
One common pattern in App Engine is using memcache to store entities retrieved from the datastore - either individual ones or a list of them - to avoid re-executing common queries. The natural pattern is something like this:
from google.appengine.api import memcache from google.appengine.ext import db entities = memcache.get("somekey") if not entities: entities = MyModel.all().fetch(10) memcache.set("somekey", entities)
This has several problems:
- Pickling in general is slow. App Engine doesn't have the more efficient cPickle module, so pickling on App Engine is sloooooow.
- Changes in the Model class between versions risk causing errors when unpickling entities that were created before the update.
- The Model class often contains a cached copy of the internal data representation, so often you're storing everything twice in the pickled data.
Previously, there wasn't really a way around this. You could store some processed form of your data, but this is awkward and creates extra work. Fortunately, a minor change in 1.2.5 makes efficient memcaching now easy.
Two new functions were added to the ...
After a couple of weeks of concerted effort, I'm pleased to announce the initial release of a new service, netboot.me. Netboot.me takes regular netbooting and makes it a whole lot more versatile - now, you can netboot directly into the installers for many popular linux distros, as well as system tools and even live linux distributions, all directly over the Internet, and without any local configuration required!
All that's required to set up netboot.me is a spare writable CD, USB key, or floppy disk to write a small (less than 1MB) disk image to. Alternately, determined geeks can change their DHCP server to allow computers to netboot directly. Once you've done that, booting off the device on any computer with wired ethernet (wifi is a work in progress) will automatically cause the bootloader to download the current version of the menu from netboot.me, which you can then find the boot image you want to boot from. Selecting it causes the boot image to be downloaded and booted immediately.
Currently on the boot menu:
- Installers for several popular linux distros (Ubuntu, Fedora, OpenSUSE, Debian).
- The FreeBSD installer.
- Tiny Core and Micro Core Linux ...
I'm pleased to say that BDBDatastore 0.2 is now released. With this release, BDBDatastore is now officially at feature parity with the production App Engine datastore. That is, it ought to be able to do everything the production datastore can, which means you can port your apps off the production datastore without having to change them.
Installation instructions can be found here. The release is numbered 0.2, but if it proves stable enough, it will become the official 1.0 release of BDBDatastore. So treat it as beta at least until it's got a little more testing.
If you try it out, speak up! I'd like to hear what people think of this. In the meantime, I'm going to start working on writing a container for running App Engine apps on Apache and other HTTP servers, as well as doing load testing and profiling of BDBDatastore.Newer Older