Monday, June 24, 2013

Bitmap Sort

I was looking through John Bentley's "Programming Pearls" and I found a problem that I encountered before and decided to make note of it here. So you have a large amount of numbers that you're supposed to sort, but there's not enough memory to do that in. Obvious answer is to look for methods of sorting on disk aka  external sorting. There's a few methods known that include taking multiple passes through the input or parallelism. But this article is not about any of those - it should be faster than any of those. Consider that you have a set of 48000 unique numbers in the range of 1..64000. Does this make a difference? Also consider that you have 10K available memory, so you can't load all the numbers in memory. 

The idea here is that we can represent the input as a bit sequence (bit map) - the only thing left is to think of a data type. We could use a number of integers, and map our input to the integers bits. In Java an integer takes 4 bytes or 32 bits, this means we could map 32 input numbers to each integer. So, for example an input number 16 will map to bit 16 of integer 1, where an input number 37 will map to bit 5 of integer 2. 

To represent all possible input numbers we'll need 64000 / 32 integers - 2000 which will take 2000 * 4 = 8K, which fits into our limitations. Second step would be to output the sorted array. Since mapping the integers to a specific bit, makes the sequence sorted, we just need to traverse the bits and output every bit that has been turned on. In the end we get a linear time algorithm that takes O(n) or precisely O(64000) and it looks like this:

- The IO operations are there just to simulate input and output, instead of loading array into memory. Those are trivial, and implementation depends upon language.
- BUCKET_SIZE = 32 in our case

I will go through the code a little to make sure it is clear. So first step is the createBitMap method which creates a bitmap out of the input. On line 11 we decide which bit to set, whereas in line 12 we decide which bucket (integer) this bit should go to. It also does get set on line 12, through an OR operation.

Second step is going through the created bitmap and showing the actual integer values. On line 24 we go through each bit of an integer and check if it's set or not (line 25). If set we convert to the actual integer values on line 27.

No comments:

Post a Comment