# Heaps

A heap is a common data structure that can be used to maintain a priority queue, implement a sorting mechanism, and generally keep track of extremal (minimal or maximal) values in an efficient manner. Note that the Python Standard Library contains a heap implementation in heapq, but we're going to write our own first and then take a peak inside the reference implementation.

The Wikipedia article on Heaps contains pointers to a couple of specialized heaps with additional features, but we're going to start off easy with a very simple one: a min heap.

## A simple Min Heap implementation

We're going to start with a min heap, and then check how we can generalize it to a MaxHeap. We need a capacity for the heap, and since each level is going to have 2**n items maximum, it makes sense to start off with a capacity that is a power of 2.

```
class MinHeap(object):
def __init__(self, capacity = 2 ** 20):
# the capacity should be a power of two - since all levels in a heap
# are densely populated, and why would you want to waste space?
# the default is 1MB of arbitrary data
self.data = [None] * capacity
self.used = 0
```

Inserting an item is a two-step process:

- Insert the item at the end of the heap
- Push the item up until we've reached a proper insert position

```
class MinHeap(object):
...
def Insert(self, value):
assert self.used < len(self.data)
self.data[self.used] = value
self.used += 1
self.PushUp(self.used-1)
```

That was the easy part. Now for the fun part: the item is at the end of the heap. we need to push it up to a position, where the heap property is satisfied.

```
class MinHeap(object):
...
def PushUp(self, item_position):
assert item_position >= 0
assert item_position < self.used
# the value to be pushed up the tree
item_value = self.data[item_position]
# when does the loop end? when the item has reached
# the top position - there is no place higher than the top!
while item_position > 0:
# get index and value of parent
parent_position = GetIndexOfParent(item_position)
parent_value = self.data[parent_position]
# if the parent is less than this value, we need to push ourselves up
if parent_value > item_value:
self.data[parent_position] = item_value
self.data[item_position] = parent_value
item_position = parent_position
else:
# the parent is larger, so we are in a good position
break
return item_position
```

This code is missing an important function: `GetIndexOfParent`

. Now, in the literature (e.g. Cormen et al) you'll find the following definitions:

`PARENT(i) = i/2`

`LEFT(i) = 2i`

`RIGHT(i) = 2i+1`

This won't work with Python, because Python arrays are zero-based. So `i`

is really `(i+1)`

, and from the result we need
to subtract 1 to get to the actual position. Sounds complicated? Let's try to insert it:

`PARENT(i) = PARENT(n+1) - 1 = (n+1)/2 - 1 = (n+1)/2 - (2/2) = (n+1-2)/2 = (n-1)/2`

`LEFT(i) = LEFT(n+1) - 1 = 2*(n+1) - 1 = 2*n + 2- 1 = 2*n + 1`

`RIGHT(i) = RIGHT(n+1) - 1 = 2*(n+1) - 1 + 1 = 2 * n + 2`

Therefor we have:

```
def GetIndexOfParent(n):
return (n-1)/2
def GetIndexOfLeftChild(n):
return (2*n)+1
def GetIndexOfRightChild(n):
return (2*n)+2
```

Getting the minimum is trivial:

```
class MinHeap(object):
...
def GetMin(self):
assert self.used > 0
return self.data[0]
```

Extracting the minimum is another important operation: we special case for a heap that has only one element (so there is no need to invoke Heapify()).

```
class MinHeap(object):
...
def ExtractMin(self):
assert self.used > 0
result = self.data[0]
if self.used > 1:
# move item from the end of the heap to the top, and heapify it (move it down
# until the heap property is satisified again)
self.data[0] = self.data[self.used-1]
self.used -= 1
self.Heapify()
else:
self.used = 0
return result
def Heapify(self):
# start at the top
index_of_this_node = 0
value_of_this_node = self.data[index_of_this_node]
# loop end criteria: we've reached the last most item in the heap
while index_of_this_node < self.used:
# get left child (or None)
index_of_left_node = GetIndexOfLeftChild(index_of_this_node)
value_of_left_node = value_of_this_node
if index_of_left_node < self.used:
value_of_left_node = self.data[index_of_left_node]
# get right child (or None)
index_of_right_node = GetIndexOfRightChild(index_of_this_node)
value_of_right_node = value_of_this_node
if index_of_right_node < self.used:
value_of_right_node = self.data[index_of_right_node]
# find index of the node that is bigger than ours
index_of_max_node = index_of_left_node
value_of_max_node = value_of_left_node
if value_of_right_node < value_of_max_node:
value_of_max_node = value_of_right_node
index_of_max_node = index_of_right_node
# if there is a match, swap the nodes
if value_of_max_node < value_of_this_node:
self.data[index_of_this_node] = value_of_max_node
self.data[index_of_max_node] = value_of_this_node
index_of_this_node = index_of_max_node
else:
break;
```

Let's add some more helper functions:

```
class MinHeap(object):
...
def IsEmpty(self):
return self.used == 0
def GetCount(self):
return self.used
def GetDepth(self):
if self.used < 2:
return self.used
level = 1
pos = self.used -1
while pos > 0:
pos = GetIndexOfParent(pos)
level += 1
return level
```

## Pretty-Printing a Heap

One of the more interesting functions to write for a heap class is a `repr()`

method to print
the heap as a string. We're going to go for a simple plaintext function. For example, a heap with 15 items
could look like this:

```
3
15 9
17 15 12 34
44 22 40 32 46 23 47
```

The code for this has several steps:

- We enumerate all items
- Convert the items on this level to strings and store them in a list
- As we go along, keep track of the maximum string size needed to represent a single element.
- We expect 2
^{i}items on level i - A new level starts when the items on the current level have reached the expected size.
- We enumerate the list of string lists
*in reverse order* - For each item in the current level:
- Center it around the expected cell size for this level, adding spaces if necessary
- Convert the list of cells to a single string
- Adjust the expected cell size: cell size grows by 2
^{n}

- Finally, enumerate all strings again from the top down: adding newlines to combine all the lines

```
class MinHeap(object):
...
def __repr__(self):
# special case: empty heap
if self.used == 0:
return "<MinHeap: None>"
max_cell_width = 1
# We expect 2<sup>i</sup> items on level i
expected_size_of_this_level = 1
levels = []
reprs_on_this_level = None
# We enumerate all items
for i in range(self.used):
if reprs_on_this_level is None:
reprs_on_this_level = []
levels.append(reprs_on_this_level)
# Convert the items on this level to strings and store them in a list
k = repr(self.data[i])
reprs_on_this_level.append(k)
# As we go along, keep track of the maximum string size needed to represent a single element.
if len(k) > max_cell_width:
max_cell_width = len(k)
if len(reprs_on_this_level) == expected_size_of_this_level:
# We expect 2**i items on level i
expected_size_of_this_level *= 2
reprs_on_this_level = None
max_cell_width += 1
# Center a single word around the expected cell size for this level, adding spaces if necessary
def center(word, max_cell_width):
result = [' '] * max_cell_width
word = list(word)
insert_pos = (max_cell_width - len(word))/2
for c in word:
result[insert_pos] = c
insert_pos += 1
return "".join(result)
# We enumerate the list of string lists in reverse order
for i in range(len(levels)-1,-1,-1):
level = levels[i]
for k in range(len(level)):
level[k] = center(level[k], max_cell_width)
# Convert the list of cells to a single string
levels[i] = "".join(levels[i])
# Adjust the expected cell size: cell size grows by 2**n
max_cell_width *= 2
# Finally, enumerate all strings again from the top down: adding newlines to combine all the lines
return "\n".join(levels)
```

## Comparing our code with the reference implementation

As mentioned in the beginning, the python standard library contains a reference implementation of the heap data structure. You can read the source on the web at http://svn.python.org/projects/python/trunk/Lib/heapq.py. A couple of comments on that code:

- The code starts out with a very interesting and thorough discussion of heaps - much better than I can explain it ;)
- The code doesn't define a class - it imposes the heap structure on items in an existing list. It is vaguely interesting but ultimately pointless to discuss which approach is better for reuse of code...
`heappush`

is essentially the same as the`Insert()`

method above: insert at the end, then move the item to the correct insert position`heappop`

also has the same essential structure, but`_siftup`

(the equivalent of`Heapify()`

above) is much more dense (and probably considerably faster). But note the extensive comment describing the operation