# Understanding Counting Sort

There exist a few specialized sorting algorithms that make use of *extra knowledge about input data*.
This means: they exist, and you should know about them, but they work only in certain specialized circumstances.
The input data for any sorting algorithm is normally defined like this:

- Input:
`L`

is a list of_{x}`n`

objects for which some sort of ordering relationship is defined

Using this you can sort, for example, strings, integers and stereotypes. Counting Sort has a more restricted definition:

- Input:
`L`

is a list of_{x}`n`

objects for which some sort of ordering relationship is defined,**and** - You know in advance the size
`m`

of the object universe`U`

input items are taken from,_{O}**and** - There exists a relationship
`r(x)`

that maps an object to be sorted to an index,**and** `m`

is reasonably small.

Hm. Conditions 1-3 are fairly straightforward and can be defined for most input data. For example, when sorting strings:

- Input:
`L`

is a list of_{x}`n`

strings `m`

is the size of the object universe`U`

of all strings. Assume your input uses only the 26 letters of the alphabet, and you have a string length of 10 would be 26_{O}^{10}= 141.167.095.653.376- There exists a relationship
`r(x)`

that maps an object to be sorted to an index: lexographically order`U`

, and then use the index in that list_{O}

However, for many interpretations of *reasonably*, 141.167.095.653.376 is not a number that is reasonably small. Right?
On the other hand, let's assume you have lots of integers in the range 0..100:

- Input:
`L`

is a list of_{x}`n`

small integers `m`

is the size of the object universe`U`

of all integers, which is 100_{O}- There exists a relationship
`r(x)`

that maps an object to be sorted to an index: the integer value itself `m`

is indeed reasonably small.

Given that "reasonably small" is not an exact definition, you'll have to try and come up with numbers that work on your system, with your data.

## The basic idea behind Counting Sort

- Assume you have
`m`

integers in the range`0..n-1`

. - Create
`n`

integer*counters*, starting at 0 - For each item in the input array: increase the counter associated with the integer value
- At the end,
*recreate the sorted array from information sorted in the counters*

For example, assume you have input numbers in the range 0..4, like this:

```
[3, 2, 0, 2, 2, 0, 2, 2, 1, 1]
```

If you count the numbers of 0s, 1s, 2s and 3s, you get this:

```
Number of 0s: 2
Number of 1s: 2
Number of 2s: 5
Number of 3s: 1
```

A sorted array can now be *created* using this information:

```
[0, 0, 1, 1, 2, 2, 2, 2, 2, 3]
^^^^
^^^^
^^^^^^^^^^^^^
^
```

This should give you an important type of insight: you can use this to recreate sorted arrays, but only
if the type of array *can be recreated*. For example, you couldn't really use this to sort objects in memory,
unless you have *integer based keys* for these objects.

## The algorithm

The above lends itself to a straight-forward implementation:

```
def counting_sort(A):
# create counters initialized to zero
frequencies = [0] * (max(A)+1)
# count each item
for value in A:
frequencies[value] += 1
# recreate sorted array
writepos = 0
for value, count in enumerate(frequencies):
for i in range(count):
A[writepos] = value
writepos += 1
return A
```

## But how does it perform

Here are the results of comparing "sort 1 million randomized integers in the range 0..9999":

Method | Python | Javascript | C# | C/C++ |
---|---|---|---|---|

Builtin array-sort | 847 ms | 443 ms | 84 ms | 110 ms |

Counting sort | 553 ms | 353 ms | 5 ms | 0 ms |

It is somewhat depressing to see how bad Python performance actually is... Also, I was surprised by C#, I had thought it to be much worse...