# Understanding Mergesort

Mergesort is a sorting algorithm that is interesting to compare with quicksort.

- Mergesort and Quicksort are both recursive Divide & Conquer algorithms
- Quicksort is randomized, meaning that the number of recursive calls required cannot be determined ahead
- Mergesort has a completely deterministic runtime - but this is typically slower than Quicksort. On the other hand,
Quicksort
*can*have a much worse runtime, if the input data is really bad (for example, quicksort doesn't perform too well on already sorted input). - Mergesort is like a good student at sorting: reliable; whereas Quicksort is more like the mad genius that can be brilliant and completely disappoint at other times.

The basic structure of quicksort was this:

- partition the input data
- recursively sort the lower half
- recursively sort the upper half

The basic structure of mergesort is this:

- recursively sort the lower half
- recursively sort the upper half
- merge the two sorted halfs: because both halfs are already sorted, this is much more simple algorithm than the partition function.

From this description you can see already what quicksort does different: by first looking at the data and
*using partial sorting information gained from looking at the input data* it can perform better - in some cases -
than mergesort can.

Mergesort has one additional drawback: it needs additional memory to store the data. Let's look at two slightly different implementations of Mergesort in Python, and then check the performance compared with Quicksort.

## The Unsurprising Implementation

The first implementation works "without surprises", meaning it doesn't use python-specific list manipulation functions. The merge process looks difficult, but it really isn't:

- create a temporary array of the same size as the input array (or rather: input range)
- Since both halfs of the input are already sorted, we need to look only at a single element in both lists in order to fill the temporary array
- There may be traililng elements in both halfs that need to be added to the end of the temp array
- Finally, write back the temp array to the original array.

This is the code:

```
def merge(A, start, mid, stop):
# create temporary array
width = (stop+1-start)
temp = [None] * width
index_in_lower_half = start
index_in_upper_half = mid+1
write_index = 0
# insert the contents of the already sorted array halfs into the temporary array
while (index_in_lower_half <= mid) and (index_in_upper_half <= stop):
a = A[index_in_lower_half]
b = A[index_in_upper_half]
if a < b:
temp[write_index] = a
index_in_lower_half += 1
else:
temp[write_index] = b
index_in_upper_half += 1
write_index += 1
# insert any trailing elements from the lower halg
if index_in_lower_half <= mid:
while index_in_lower_half <= mid:
temp[write_index] = A[index_in_lower_half]
index_in_lower_half += 1
write_index += 1
# insert any trailing elements from the upper half
if index_in_upper_half <= stop:
while index_in_upper_half <= stop:
temp[write_index] = A[index_in_upper_half]
index_in_upper_half += 1
write_index += 1
# write back temp array to the original input array
write_index -= 1
while write_index >= 0:
A[start+write_index] = temp[write_index]
write_index -= 1
def mergesort_range1(A, start, stop):
if start < stop:
mid = start + (stop-start)/2
mergesort_range1(A, start, mid)
mergesort_range1(A, mid+1, stop)
merge1(A, start, mid, stop)
def mergesort1(A):
if not A:
return
mergesort_range1(A, 0, len(A)-1)
return A
```

## The surprising implementation

Python has lots of builtin list manipulation functions. In Python, allocating temporary lists from builtin functions can be faster than fine-tuned code that doesn't allocate additional memory! The following example is a more pythonesque way of doing mergesort, where the merge-part creates a list dynamically, on the fly.

```
def mergesort2b(array):
size = len(array)
if size < 2:
return array
else:
middle = size / 2
left = mergesort2(array[:middle])
right = mergesort2(array[middle:])
result = []
i, j = 0, 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result += left[i:]
result += right[j:]
return result
```

This code has the same basic structure, but it doesn't pre-allocate an array of n items, instead it
appends the results as it goes along, thereby skipping the need for a write index. And, surprisingly, it turns
out to be faster than the code above. There is a further trick that can be applied: list.pop() is typically
the fastest way to get an element from the end of a list, *and* reduce the list size. The only problem is:
we're taking items from the front of the two halfs. Well, we can change that: by first *reversing the two lists to be merged*,
we can then compare and pop items from the end of those two lists, and finally reverse them back for the rest.
This is how the code looks like, and it was the fastest one I could come up with in Python:

```
def mergesort2(array):
size = len(array)
if size < 2:
return array
middle = size / 2
left = mergesort2(array[:middle])
right = mergesort2(array[middle:])
# reverse both lists, so that we can take items from the back of the list
left.reverse()
right.reverse()
result = []
while left and right:
# note: no read indices, and we're popping items from the back of the list
if left[-1] <= right[-1]:
result.append(left.pop())
else:
result.append(right.pop())
# reverse lists back, so that they can be added to the result list
left.reverse()
right.reverse()
if left:
result.extend(left)
if right:
result.extend(right)
return result
```

But how does it perform compared with Quicksort? Well, actually disappointingly so. For the standard test of 1 million floats, these were the results:

Method | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
---|---|---|---|---|---|

Quicksort Hetland | |||||

Quicksort Pyalda | 6.38 | 6.31 | 6.30 | 6.37 | 6.33 |

Mergesort | 8.10 | 8.08 | 8.08 | 8.09 | 8.01 |

Looking at this, you may wonder why bother with Mergesort at all? Well, do a simple example: try sorting an already sorted list of 1 million floats.
You will find that while mergesort runs as expected (actually, a bit faster), both Quicksort implementations will
*break and throw a RuntimeError: maximum recursion depth exceeded in cmp, because their recursion depth will exceed the maximum.*
You

*cannot*sort a million floats in (our recursive implementation of) quicksort, because it'll essentially require a million-deep stack, and unless you are really rich your machine won't support that.

This is what was mentioned at the top of this page: mergesort is boring, but reliable. quicksort is a genius, but sometimes it'll disappoint you badly.

And now, for a *real* surprise, turn to Heapsort next...