Sunday, June 16, 2013

Order Statistics. Kth Smallest or Largest Element in an Un-ordered Array

Finding the largest or smallest element in an un-ordered array in O(n) is a trivial and obvious task, but what about finding the third largest or sixth or abstractly speaking kth? In statistics this is called Order Statistic. One way of solving this is using the QuickSort partitioning technique. To remind you how this works: you choose a pivot element, and then move all elements less than the pivot element to the left and all elements that are larger than the pivot element to the right. After this operation the pivot element is on the right spot and if the k you're looking for equals to the pivot's index, you don't have to sort anymore and just return the pivot.

As an example, let's say I have the following array:
[42, 92, 31, 73, 46, 11, 29, 53, 64, 4]

I chose the first element as the pivot (or we could use a random element) - 42.
Now we iterate from the start and end of the array and swap the elements in-place as they don't conform to our property (less than the pivot on the left and larger on the right). Is 92 less than 42 ? No, whom do we switch it with ? Let's look at the elements at the end of the array, since in the end we want to have an ascending order sorted array (if we need to). Is 4 larger than 42 ? No! Swap this with 92. Is 31 less than 42? Good. 73? No, then swap, and so on.
After the first operation the array should look like this:
[11, 4, 31, 29, 42, 46, 73, 53, 64, 92]

Now, 42 is in the right spot and it's the 5th number (index 4). Let's say we're looking for the 2nd smallest (index 1) so 5 is no good to us. What we do now, is doing the same thing just over the smaller array that we have: 
[11, 4, 31, 29]

Again, chose a pivot - 11, and move all lesser values to the left and larger to the right:
[4, 11, 31, 29]
11 is the element with index number 1 thus it is the second smallest element so we can stop the algorithm here.

Computing the kth largest element is the same task, with the only difference that we have to translate the k value, since the largest elements are located at the end of the array. So we'll have to translate k using this formula: k = array.length - k + 1.
Here's the algorithm in code: