DIY USB preloading with *nix

Having recently received a large number of USB flash drives, I needed a solution for preloading them in bulk. Dedicated USB preloading/flashing devices are pricey - starting at over 500 euro for a small model - and while the preload services most companies offer (including Memotrek, the company we ordered the drives from) are handy, they add an extra 50c or so to the price of each drive, and the preload is quickly out of date. With that in mind, I decided to go the DIY route. This post documents my attempts and the final (successful) result.

To start, you need a lot of USB ports. I purchased two D-Link DUB-H7 7 port USB hubs, but any hubs ought to do, as long as the spacing between the ports is sufficient to accommodate a flash drive in every port. You won't need the included power bricks, as the power provided by the USB host is sufficient even for 7 UDB flash drives.

The general process of bulk flashing goes something like this:

  1. Plug in one of your drives. Wipe it with "dd if=/dev/zero of=/dev/your-drive bs=1M", partition and format it, and write the data you want to preload to it. Set the volume label to something appropriate.
  2. Create an image of the drive you just constructed by doing "dd if=/dev/your-drive of=image_file bs=1M". Take care to image the whole drive (eg /dev/sdb) not just the partition (eg, /dev/sdb1).
  3. (Optional) Use a tool to truncate the image file at the last non-zero byte. Because FAT32 packs all files at the beginning of the partition, this allows you to cut the image file down to roughly the size of the total amount of data you want to load, while still flashing a full-sized filesystem to the devices.
  4. Plug in your USB hubs, and insert a USB flash drive into each port. Take note of the device names they get assigned.
  5. Use a flashing tool that supports multiple output devices to write the modified image file to the drives, again taking care to write to the device, not the partition. Take care not to overwrite your hard disk!

The last step is, of course, the most interesting one. Initially, I had intended to use a tool called dcfldd, a dd extension that supports multiple output devices. Unfortunately, as I discovered when I tried it, dcfldd writes to multiple output devices serially, waiting for each write to complete before continuing on to the next device. While this creates a very pretty 'chasing' effect as the USB activity lights flash one by one, it means that flashing n devices goes 1/nth as fast as flashing one!

On realising this shortcoming of dcfldd, I set out to write my own multi-device flashing tool. Go, with its excellent concurrency model, seemed like a natural choice here, especially since I've been playing with it anyway.

Writing a program like this in languages such as C would likely involve a lot of complications dealing with non-blocking IO, or complex synchronization primitives for worker threads. In Go, however, the overall plan is quite straightforward:

  1. Start up one 'writer' goroutine per output device. Give each one its own channel.
  2. Open the input device and read blocks in.
  3. For each block read in, send a pointer to it to each writer goroutine.
  4. Once all blocks have been read, send an empty block to each writer to let it know we're done.

Let's start by defining the writer goroutine:

func write_blocks(outname string, outfile *os.File, in chan []byte) {
  for {
    block := <-in;
    if len(block) > 0 {
      if _, err := outfile.Write(block); err != nil {
        log.Exitf("Error writing to %s: %s\n", outname, err);
    } else {

This ought to be fairly easy to understand. The function takes the name of the output file (used solely for error reporting), an opened file to write to, and a channel that will be used to send it new blocks. We use Go's syntax for an infinite loop (a 'for' without any conditions). In each iteration, we read a block from the input channel, and write it to the output file. When we receive the final empty block, we close the output file and return.

The reason this function is so simple is that it's doing everything synchronously. It blocks on the input channel, then blocks again on the write function until the data is written. Go's concurrency model means that it will take care of executing other goroutines while it's waiting for data.

Next, we'll define the main function, starting with some flag definitions:

func main() {
  var ok os.Error;

  var if_f *string = flag.String("if", "", "Input file");
  var of *string = flag.String("of", "", "Output file(s)");
  var bs *int = flag.Int("bs", 4096, "Block size in bytes");

  outs := strings.Split(of_arg, ",", -1);
  chans := make([]chan []byte, len(outs));

Go's flag library makes handling command line flags very easy. Here, we define each flag as a pointer variable, initializing it with the appropriate function from the flag module. The arguments to the function denote, in order, the name of the argument, the default value, and the help text. The argument names are based on the ones used in dd, though our program will require arguments to be prefixed with --, unlike dd. Note that the variable for the input file is called 'if_f' - 'if' is a reserved word in Go. Finally, we split the output file argument into a list of output devices, and create one channel for each output device.

Next, we need to open the input and output files:

  var infile *os.File;
  if infile, ok = os.Open(*if_f, os.O_RDONLY, 0644); ok != nil {
    log.Exitf("Unable to open input file: %s\n", ok);
  for i, out := range outs {
    chans[i] = make(chan []byte);
    if outfile, ok := os.Open(out, os.O_WRONLY | os.O_CREAT, 0644); ok != nil {
      log.Exitf("Unable to poen output file %s: %s\n", out, ok);
    } else {
      go write_blocks(out, outfile, chans[i]);

Note we open the output file in the parent goroutine, and pass the handle to the opened file to the goroutine that will write to it. This simplifies error handling, allowing us to detect errors opening the output devices and abort immediately if we encounter one.

Now the real work happens: We need to read blocks from the input file and send them to the writer channels. This is complicated slightly by the need for error handling:

  total_bytes := 0;
  for i := 0; true; i++ {
    buf := make([]byte, *bs);
    // Read a block
    var size int;
    switch size, ok = infile.Read(buf); ok {
    case os.EOF:
    case nil:
      log.Exitf("Error reading block: %s\n", ok);
    total_bytes += size;
    if size == 0 {
    // Take just the read bytes
    s := buf[0:size];
    // Send it to the channels to be written
    for j := 0; j < len(chans); j++ {
      chans[j] <- s;

First, we create a new buffer of the appropriate size. We need to do this in each iteration of the loop, because it's possible that one of the writer goroutines will be writing a block while we are reading the next block; if we'd used the same block for every iteration, this would cause corruption.

Next, we read in the data with the use of a Go switch statement to handle errors. There are three possible cases: ok contains an os.EOF error, in which case we should stop copying, or it contains nil, in which case everything went fine, or an unexpected error is raised, in which case we should print it out and exit. Note that switch statements in go do not have fall-through, so the 'case nil' block does nothing, rather than falling through to the default case.

Next, we apply a slicing operator to the input buffer, which returns a slice containing only the bytes that were read in this operation. Like most I/O interfaces, Go's file IO does not guarantee that all reads will be for the full amount of data requested, so we have to make sure the writer goroutines only write as much data as we actually read in. In practice, the only time you should get short reads when reading from a regular file is when you reach the end of the file.

Finally, we iterate over each of the output files, sending them the block of data to write. Since s is a pointer, we're really only sending them a pointer; all the goroutines will read the block of data from the same chunk of memory, preventing unnecessary copying.

Once the loop exits due to it encountering an os.EOF error or a 0-byte read, we tidy up and exit:

  // Send empty buffers to all open channels to let them know they should close
  for i := 0; i < len(chans); i++ {
    chans[i] <- []byte{};

That's all there is to it. Using this very simple code, I was able to flash 14 drives at a time in no more than the time it takes to flash a single one!


blog comments powered by Disqus