Penny for your thoughts?

I've been using my spare time lately to build something that's even more trivial and silly than usual. Behold, the advice machine!

Operation is pretty straightforward. You insert whatever quantity of coins you see fit, then request some advice from the machine. The quality of the advice you get reflects the value of the coins you put in - more or less. More money typically results in more in-depth fortunes, and more interesting ones such as quotations, while a small amount gets you platitudes or terrible jokes.

Here's a video of it in action:

More photos can be found here.

All in all, the project was relatively straightforward, after a couple of false starts. The coin acceptor is a standard item commonly found on ebay - this one has model number CH-926 - for around about $20-$30. The device has a moderately complex setup procedure whereby you 'train' it on a set of sample coins, and tell it how many pulses to output for each coin. After that, it will recognize coins similar to the ones it was trained on, outputting the appropriate number of pulses on one of its pins when each is detected.

The LCD is a ...

Damn Cool Algorithms: Cardinality Estimation

Suppose you have a very large dataset - far too large to hold in memory - with duplicate entries. You want to know how many duplicate entries, but your data isn't sorted, and it's big enough that sorting and counting is impractical. How do you estimate how many unique entries the dataset contains? It's easy to see how this could be useful in many applications, such as query planning in a database: the best query plan can depend greatly on not just how many values there are in total, but also on how many unique values there are.

I'd encourage you to give this a bit of thought before reading onwards, because the algorithms we'll discuss today are quite innovative - and while simple, they're far from obvious.

A simple and intuitive cardinality estimator

Let's launch straight in with a simple example. Suppose someone generate a dataset with the following procedure:

  1. Generate n evenly distributed random numbers
  2. Arbitrarily replicate some of those numbers an unspecified number of times
  3. Shuffle the resulting set of numbers arbitrarily

How can we estimate how many unique ...