Linked Lists

Linked Lists are a staple data type. In C/C++, they are a quintessential means of keeping track of dynamically allocated objects in a space-efficient manner. And since linked-list related algorithms typically require lots of intricate pointer manipulation, they often do come up in programming interview questions.

Now, in Python, you have a builtin list datatype, so you could ask: why bother? Well, for one you'd want to be prepared for the same type of questions, and secondly it is a nice exercise. So let's get cracking

Single Linked Lists vs. Double Linked Lists

Single Linked Lists operate with a single next pointer, Double Linked Lists have both a next and a previous. Since operations with a single next pointer tend to be more complicated, we're going to focus on single linked lists below.

Core Implementation

How do you implement a pointer in a programming language that ostensively doesn't have pointers? By finding out that of course Python has pointers, it just doesn't call them pointers and thus hopes to fool you into believing that it is any better than plain old C because of that. Actually, having learned programming in assembler I've never quite understood what the fuzz was about pointer algorithms, but well, it is a time honoured tradition to call pointer manipulation difficult.

So, how does a non-pointer pointer in Python look like? It is simply an object reference. That means, the quintessential Python Single Linked Node / List datatypes look like this:

class SingleLinkedNode(object):
    def __init__(self, value, next = None):
        self.value = value
        self.next = next

class SingleLinkedList(object):
    def __init__(self):
        self.first = None

That wasn't too bad, was it? Now, let's get cracking by adding some functionality to it.

Inserting a node at the head

In a Single Linked List, you typically insert nodes at the head. The code borders on the trivial:

class SingleLinkedList(object):
    ...
    def AddValueAtHead(self, value):
        self.first = SingleLinkedNode(value, self.first)

Printing list objects

To be able to visualize a list, it makes sense to overwrite __repr__. And the easiest way to visualize a Single Linked List is to print it as a Python list. So:

class SingleLinkedList(object):
    ...
    def __repr__(self):
        temp = []
        node = self.first
        while node:
            temp.append(node.value)
            node = node.next
        return repr(temp)

Getting the length of a list

To find the length of the list, you must enumerate all items in the list, and keep track of the count as you go along.

class SingleLinkedList(object):
    ...
    def GetLength(self):
        size = 0
        node = self.first
        while node:
            size += 1
            node = node.next
        return size

Accessing the N-th Element in a list

To access the n-th element in a Single Linked List, you basically must do the same: enumerate all items, keep track of the count = index as you go along, and return the list if the count is n.

class SingleLinkedList(object):
    ...
    def GetNthNode(self, n):
        if n == 0:
            return self.first

        if self.first:
            m = 0
            node = self.first
            node = node.next
            while node:
                m += 1
                if m == n:
                    return node
                node = node.next
        return None

Note that this code handles a couple of special cases:

If you ask for the 0-th node, you'll get self.first even if the list is empty: because if it is empty, self.first will be None.
If the list is empty and you ask for the n-th node with n > 0, then you'll always get None.
If the list has n items and you ask for the k-th node with k > n, then you'll always get None.

So far, everything has been pretty gentle. Now let's ask some more difficult questions.

Write a function that merges two sorted lists

This is problem 4.1 in the excellent Elements of Programming Interviews. The first question should be: how can I test this?

We need to create two sorted lists of random elements
We merge those two lists with our algorithm
We verify that the combined list is still sorted

This translates to the following code:

#! -*- Encoding: Latin-1 -*-

def selftest():
    SL1 = SingleLinkedList.CreateRandomSortedList(10)
    SL2 = SingleLinkedList.CreateRandomSortedList(7)
    SL1.MergeSortedList(SL2)
    assert SL1.IsSortedAscending()

if __name__ == "__main__":
    selftest()

Of course, this code leaves most of the work undone ;) There are three functions in this test code that we need to implement. The easiest one is the function to create a list of random, ascending values: create a list of n random values, sort them, reverse them, and then insert them into the list. You may ask why it is necessary to reverse the list: because the Single Linked List mostly uses the AddValueAtHead method, not a AddValueAtTail which would work without the need for reversing the input. Given all that, our code looks like this:

class SingleLinkedList(object):
    ...
    @staticmethod
    def CreateRandomSortedList(number_of_elements):

        values = [random.randrange(1860) for k in range(number_of_elements)]
        values.sort()

        result = SingleLinkedList()
        for value in reversed(values):
            result.AddValueAtHead( value )
        return result

Note that the method is decorated with @staticmethod, so that you can call it to create a new instance of SingleLinkedList, rather than the need to invoke it on an existing instance.

IsSortedAscending is quite easy as well: enumerate all items in a list, and ensure that each consecutive item is >= the last seen item.

class SingleLinkedList(object):
    ...
    def IsSortedAscending(self):
        node = self.first
        if node:
            last_known_value = node.value
            node = node.next
            while node:
                if node.value < last_known_value:
                    return False

                last_known_value = node.value
                node = node.next

        return True

That leaves us with the function to actually merge two sorted lists. And I'm going to warn you: the code is quite ugly (and long, by Python standards). Let's try to see the main logic. Assume we want to merge the two lists A and B into A', both of which are sorted.

    - special case: A is empty -> return A' = a clone of B
    - special case: B is empty -> return A' = A
    - for each B[i] < A[0]: insert B[i] before A[0]
    - if there are nodes left in B (it could be that all B[k] < A[0]):
    ------ while A[i] < B[0]: move i one up
    ------ while B[k] <= A[i]: insert B[k] before A[i]
    ------ if no more A[k]: append remainder of B, done.
    ------ if no more B[k]: done

With all the special cases, you'll note that this algorithm has to take care of a lot of rough edges, making it quite long and ugly (I did warn you, didn't I?)

class SingleLinkedList(object):
    ...
    def MergeSortedList(self, other_list):

        # ensure that the algorithm preconditions are met
        assert other_list.IsSortedAscending()
        assert self.IsSortedAscending()

        # special case 1: A is empty -> return A' = a clone of B
        if self.first is None:
            self.Clone(other_list)

        # special case 1: B is empty -> return A' = A        
        elif other_list.first is None:
            pass # nothing to see here, move along

        else:

            # for each B[i] < A[0]: insert B[i] before A[0]
            this_insert_pos = None
            this_node = self.first

            other_node = other_list.first
            while (other_node is not None) and (other_node.value <= this_node.value):
                if this_insert_pos is None:
                    self.first = SingleLinkedNode(other_node.value, self.first)
                    this_insert_pos = self.first
                else:
                    this_insert_pos.next = SingleLinkedNode(other_node.value, this_insert_pos.next)
                    this_insert_pos = this_insert_pos.next

                other_node = other_node.next

            if other_node is not None:
                # now, start moving our position until we've reached the first node in our list that is higher 
                # than we want it to be
                this_insert_pos = self.first
                this_node = self.first.next
                assert this_insert_pos.next == this_node

                # if no more B[k]: done
                while other_node is not None:

                    assert this_insert_pos.next == this_node

                    # while A[i] < B[0]: move i one up
                    while (this_node is not None) and (this_node.value < other_node.value):
                        this_node = this_node.next
                        this_insert_pos = this_insert_pos.next
                        assert this_insert_pos is not None
                        assert this_insert_pos.next == this_node

                    if this_node is None:

                        # if no more A[k]: append remainder of B, done.
                        assert this_insert_pos is not None
                        assert this_insert_pos.next is None

                        while other_node is not None:
                            this_insert_pos.next = SingleLinkedNode(other_node.value, this_insert_pos.next)
                            this_insert_pos = this_insert_pos.next
                            other_node = other_node.next

                        break

                    # while B[k] <= A[i]: insert B[k] before A[i]
                    while (other_node is not None) and (other_node.value <= this_node.value):
                        this_insert_pos.next = SingleLinkedNode(other_node.value, this_insert_pos.next)
                        other_node = other_node.next
                        this_insert_pos = this_insert_pos.next

Note: if you look at the solution in EPI, you'll find that it is one of the (really few) cases where the C++ code is actually cleaner than the Python code. This is so because the C++ code can indulge on direct pointer manipulation, whereas Python code cannot. So Goal 1:0 for C++ here.

Find out if two lists overlap, and where

This is problem 4.4 in EPI, and it is interesting because there is a trivial solution that uses memory, and another solution that is trivial to understand, but it took me a while to figure it out.. Did I tell you that Elements of Programming Interviews is really rather good? OK, as usual, our first question should be: how can I test this?

We need to create two lists of random elements
We need to overlap them at some random point
We verify that the combined list is still sorted

This translates to the following code:

#! -*- Encoding: Latin-1 -*-

def selftest():
    SL1 = SingleLinkedList.CreateRandomSortedList(10)
    SL2 = SingleLinkedList.CreateRandomSortedList(7)
    SL1.OverlapAtRandomNode(SL2)
    assert SL1.IsOverlapped(SL2)
    assert SL2.IsOverlapped(SL1)

if __name__ == "__main__":
    selftest()

Note: I am reusing CreateRandomSortedList from above. It is not necessary that the lists be sorted, I am just being lazy.

So we are missing two methods here: OverlapAtRandomNode and IsOverlapped. Let's first look at the logic for creating a overlapped list out of two existing lists A and B:

Move to the end of list B.
Find a random node in A.
Add the remainder of A to the end of B

In code:

class SingleLinkedList(object):
    ...
    def OverlapAtRandomNode(self, other_list):
        assert self.first       # cannot work if this is an empty list
        assert other_list.first # cannot work if the other list is empty

        # Move to the end of list B
        node = self.first
        while node and node.next:
            node = node.next

        assert node.next is None

        # Find a random node in A
        N = other_list.GetLength()
        K = random.randrange(N)

        # Add the remainder of A to the end of B
        node.next = other_list.GetNthNode(K)

OK, now for the fun part: how to find out if the two nodes A and B do overlap. The simple solution is to keep track of nodes visited, for example by hashing their ids. In Python, we can use the set datatype for this. The logic will look somewhat like this:

Add the IDs of all items in B to a set
Enumerate A: for each item, check if it is contained in the lookup set.
If it is, we've found the overlap.

In code:

class SingleLinkedList(object):
    ...
    def IsOverlapped(self, other_list):

        # Add the IDs of all items in B to a set
        h = set()
        node = other_list.first
        while node:
            h.add(id(node))
            node = node.next

        # Enumerate A: for each item, check if it is contained in the lookup set.
        node = self.first
        while node:

            # If it is, we've found the overlap.
            if id(node) in h:
                return node

            node = node.next

Note: I have omitted an explicit return None at the end of the list, because Python will add this implicitly.

Now, this was the trivial implementation - but it uses O(M) additional space, where M = len(B). Can we do it without additional memory? Think about what you know about each item in a list. You know:

The item value
If the item has a successor or not
The value of the item successor

Given two nodes from two distinct lists, comparing either condition 1 or 3 would require comparing it to all other nodes in the other list. But condition 2 can be checked without comparing it to the other list. And if you think about it: when you have a merged list, the end node must be the same - by definition. So what you can do to find out if the two lists are merged is this:

Find the last node in A, A_last.
Find the last node in B, B_last.
If A_last == B_last, the list is merged.
- Now, you only need to find exactly where the two lists merge

How do you solve that problem?

List A has n items
List B has m items
The combined list has either m or n items, and to be more precise: it has max(m,n) items.
Assume for now that A is the longer list, so m < n. How many items of B are in A, at max? Well, all of them. That means the first possible merge position if n-m.
This implies the following logic:
- Set i = position in A = m-n
- Set k = position in B = 0
- while A_i is not B_k: increase both i and k

If you think about it, this algorithm is easy to understand, and probably faster because it doesn't need either memory allocation nor set lookup. The only drawback: the code is a bit longer.

class SingleLinkedList(object):
    ...
    def IsOverlapped(self, B):
        A = self

        # find last node in A, element count in A
        items_in_A = 0
        last_in_A = self.first
        while last_in_A:
            items_in_A += 1
            if not last_in_A.next:
                break
            last_in_A = last_in_A.next

        # find last node in B, element count in B
        items_in_B = 0
        last_in_B = B.first
        while last_in_B:
            items_in_B += 1
            if not last_in_B.next:
                break
            last_in_B = last_in_B.next

        # last items are different - lists cannot be merged
        if last_in_A != last_in_B:
            return None
        
        # if necessary, swap the lists, so that len(A) > len(B)
        if items_in_B > items_in_A:
            A, B = B, A
            items_in_A, items_in_B = items_in_B, items_in_A

        i = items_in_A - items_in_B

        # move node_in_A to first possible insert position
        node_in_A = A.first
        while i > 0:
            assert node_in_A is not None
            node_in_A = node_in_A.next
            i -= 1

        node_in_B = B.first
        while node_in_A != node_in_B:
            assert node_in_A is not None
            assert node_in_B is not None

            # move both nodes one up:
            node_in_A = node_in_A.next
            node_in_B = node_in_B.next

        return node_in_B

Zip a list

OK, one final problem from EPI: 4.11. You have a list: returned a zipped list. What this means is the following:

Input L = [ L₀, L₁, ..., L_n ]
Output L' = [ L₀, L_n-1, L₁, L_n-2... ]

And of course you should do this with constant additional memory. Test code is trivial: just create a list of random items, zip it, print both lists and visually compare them.

#! -*- Encoding: Latin-1 -*-

def selftest():
    SL1 = SingleLinkedList.CreateRandomSortedList(20)
    print "BEFORE: %r" % (SL1, )
    SL1.ZipSelf()
    print " AFTER: %r" % (SL1, )

if __name__ == "__main__":
    selftest()

So let's think about the algorithm:

The zipped list must have the same number of items as the input list
Given that the list has n items, we have to remove n/2 items from the end and interleave them at the front.
This translates into the following:
- Find n by enumerating all items and counting them
- Start again, and move to n/2.
- Create a temporary list that contains all items from n/2 up to n, in reversed order. This is actually trivial, since consecutive insertion at the front will result in a reversed list.
- You now have two lists: L', which has only n/2 items, and temp, which also has n/2 items (well, maybe one less or so).
- Now you can enumerate both lists and remove items from the top of temp, inserting them in L'.

In python code:

class SingleLinkedList(object):
    ...
    def ZipSelf(self):

        # find n
        n = self.GetLength()

        # find the n/2th node
        mid = n/2-1
        m = self.GetNthNode(mid)

        # split at m, and create a reversed list in temp
        temp = SingleLinkedList()
        k = m.next
        m.next = None
        while k:
            next_k = k.next
            k.next = temp.first
            temp.first = k
            k = next_k

        # merge the two lists, interleaving items as we go along
        node = self.first
        insert = True
        while node:

            if insert:
                k = node.next
                assert temp.first is not None
                node.next = temp.first
                temp.first = temp.first.next
                node.next.next = k
                insert = False
                
            else:
                insert = True
            
            # first node in this list is kept as-is
            if not node.next:
                
                # last node in this list reached: insert remainder of other list (if exists)
                node.next = temp.first
                break
                
            # skip this node
            node = node.next