Finding the maximum profit in a shareprice array

The following is a classic programming problem: find the best time to buy / sell a share, given an array of past share prices.

Well, there is one obvious answer: buy low, sell high. The question is: given a (potentially huge) array of share prices over time, at which point should you have bought, at which point should you have sold.

Ask yourself: Do I understand the problem?

Setting up a test harness

As usual, our first step is to prepare a testbed. Our code strategy for the test will be this:

Therefor, our test code looks like this:

#! -*- Encoding: Latin-1 -*-

def selftest():
    import random
    import time

    A = [random.random() for i in range(5000)]
    expected_result = None

    methods = { 'using brute force' : fmax_brute_force,
                'using divide and conquer' : fmax_dandq,
                'using one-pass sweep' : fmax_sequential,
                }

    for method_name in methods.keys():
        method = methods[method_name]
        t0 = time.time()
        result = method(A)
        print "%25s: %r after %r" % (method_name, result, time.time() - t0, )
        if expected_result is None:
            expected_result = result[0]

        else:
            assert result[0] == expected_result

if __name__ == "__main__":
    selftest()

Brute-Force approach

Let's try brute-force first:

This clearly separates into two logical functions:

# given a strategy that buys the share at position index, what is the maximum profit that can be achieved?
def calculate_profit_when_buying_now(A, index):
    buying_price = A[index]
    max_profit = 0
    sell_at = index
    for i in range(index+1, len(A)):
        selling_price = A[i]
        profit = selling_price - buying_price
        if profit > max_profit:
            max_profit = profit
            sell_at = i

    return max_profit, sell_at
    
# check all possible buying times
def fmax_brute_force(A):
    max_profit = None
    buy = None
    sell = None
    
    for index, item in enumerate(A):
        profit, sell_at = calculate_profit_when_buying_now(A, index)
        if (max_profit is None) or (profit > max_profit):
            max_profit = profit
            buy = index
            sell = sell_at
            
    return max_profit, buy, sell

There is a small pythonesque optimization that we can do with calculate_profit_when_buying_now, by using the builtin max/index functions on arrays, like this:

def calculate_profit_when_buying_now(A, index):
    sell_value = max(A[index:])
    sell_at = A.index(sell_value, index)
    return sell_value - A[index], sell_at

Some notes on this implementation:

Using Divide & Conquer

This strategy proceeds by dividing the problem into smaller subproblems, that can be solved more easy. Let's think about how we could do this:

The code for this is much more complex: we need to keep track of edge cases, ensure that the recursion terminates, and handle the "cross-border" selling part. And in Python, complex code often equals ugly code. But here we go:

def fmax_dandq_recursive(A, start, stop):
    n = stop - start
    
    # edge case 1: start == stop: buy and sell immediately = no profit at all
    if n == 0:
        return 0, start, start

    # edge case 2: two consequtive timestamps. buy on the first day, sell on the second.
    # this actually can result in a loss, so maybe buying and selling on the same day would be
    # preferable (becaue at least profit would be nil, rather than a loss), but this will sort
    # itself out further down the road, so we can keep the code simple here:
    if n == 1:
        return A[stop] - A[start], start, stop
        
    mid = start + n/2

    # the "divide" part in Divide & Conquer: try both halfs of the array
    max_profit1, buy1, sell1 = fmax_dandq_recursive(A, start, mid-1)
    max_profit2, buy2, sell2 = fmax_dandq_recursive(A, mid, stop)

    # the "cross-border sales strategy" part: find the minium buy price in the lower half,
    # and the maximum sell price in the upper half
    mp_buy_idx = start
    mp_buy_val = A[start]
    for k in range(start+1, mid):
        if A[k] < mp_buy_val:
            mp_buy_val = A[k]
            mp_buy_idx = k
            
    mp_sell_idx = mid
    mp_sell_val = A[mid]
    for k in range(mid+1, stop+1):
        if A[k] > mp_sell_val:
            mp_sell_val = A[k]
            mp_sell_idx = k

    # those two points generate the maximum cross border profit
    max_profit_cross_border = mp_sell_val - mp_buy_val

    # and now compare our three options and find the best one
    if max_profit2 > max_profit1:
        if max_profit_cross_border > max_profit2:
            return max_profit_cross_border, mp_buy_idx, mp_sell_idx
        else:
            return max_profit2, buy2, sell2
    else:
        if max_profit_cross_border > max_profit1:
            return max_profit_cross_border, mp_buy_idx, mp_sell_idx
        else:
            return max_profit1, buy1, sell1

def fmax_dandq(A):
    return fmax_dandq_recursive(A, 0, len(A)-1)

I did tell you the code is ugly, didn't I?

Some notes on this implementation:

Using One-Pass Sweep

Once you've reached a N-Log-N-ish time complexity, often you can call it a day. But for this particular problem, there actually exists a more beautiful algorithm - that is also much faster: O(N). What could it be? Well, O(N) gives you a hint: it means we somehow will have to visit every item in the array once. So the basic structure will be something like this:

def fmax_one_pass_sweep(A):
    max_profit = None
    buy = None
    sell = None

    for index, item in enumerate(A):
        # ... MAGICALLY COMPUTE THE BEST BUY/SELL DATES HERE ... 

    return max_profit, buy, sell

When do we buy? When prices are low. That means, we're always going to buy at the lowest price we've seen so far. So we can update the logic like this:

def fmax_one_pass_sweep(A):

    # minimum price seen so far
    min = None

    max_profit = None
    buy = None
    sell = None

    for index, item in enumerate(A):

        # sell at the minimum - update minimum seen so far
        if (min is None) or (item < min):
            min = item
            buy = index
            
        else:
            # ... MAGICALLY COMPUTE THE BEST SELL DATES HERE ...
            
    return max_profit, buy, sell

When do we sell? When prices are high. That means, we're always going to buy at the maximum profit possible so far. This completes our logic:

def fmax_one_pass_sweep(A):

    # minimum price seen so far
    min = None

    max_profit = None
    buy = None
    sell = None

    for index, item in enumerate(A):

        # sell at the minimum - update minimum seen so far
        if (min is None) or (item < min):
            min = item
            buy = index

        # buy at the maximum - update maximum seen so far
        elif (max_profit is None) or (item - min) > max_profit:
            max_profit = item - min
            sell = index

    return max_profit, buy, sell

Beautiful and fast.

Other strategies

Let's briefly revisit other common strategys and see if they could help with our problem.

But how does it actually perform?

The results on my machine for an array of 10.000 integers values were this:

        using brute force: (9995, 315, 8336) after 1.7940 secs
 using divide and conquer: (9995, 315, 8336) after 0.0200 secs
     using one-pass sweep: (9995, 315, 8336) after 0.0010 secs

Abstraction time: from int to arbitrary objects

Replacing integer values by floats or decimals is trivial and works the same way. But could we relax the input conditions even further? I honestly think we ought to sit down calmly, take a stress pill, and think things over.

Think about the following modification of the original problem: you have an array of complex numbers, and you are to compute the maximum distance between two complex numbers.

The brute-force algorithm we settled on above, using our python implementation, won't work:

def calculate_profit_when_buying_now(A, index):

    # won't work, because there is no maximum defined for complex numbers    
    sell_at = A.index(sell_value, index)
    return sell_value - A[index], sell_at

But the one we presented at the very beginning can work, with a slight modification:

# find best combination starting at index
def calculate_profit_when_buying_now(A, index):
    buying_price = A[index]
    max_profit = 0
    sell_at = index
    for i in range(index+1, len(A)):
        selling_price = A[i]

        # won't work, because the result of the subtraction of two complex numbers is another complex number,
        # and you cannot say if two complex numbers are bigger or smaller
        # profit = selling_price - buying_price
        # if profit > max_profit:
        
        # will work: you look at the absolute distance between two complex numbers
        profit = abs(selling_price - buying_price)
        if profit > max_profit:
        
            max_profit = profit
            sell_at = i

    return max_profit, sell_at
    
# check all possible buying times
def fmax_brute_force(A):
    max_profit = None
    buy = None
    sell = None
    
    for index, item in enumerate(A):
        profit, sell_at = calculate_profit_when_buying_now(A, index)
        if (max_profit is None) or (profit > max_profit):
            max_profit = profit
            buy = index
            sell = sell_at
            
    return max_profit, buy, sell

This tells us that we really need is two functions:

  1. a function that ranks possible solutions, and
  2. a function that tells us if a rank is better than another. For example, so far we've implied that the bigger the rank, the better. But what if we want to compute the minimum distance between two (distinct) points?

So, a generalized brute-force solution looks like this:

def get_best_range_from_starting_pos(A, starting_pos, get_rank, evaluate_rank):
    best_rank = None
    ending_pos = None
    for current_pos in range(starting_pos+1, len(A)):
        current_rank = get_rank(A, starting_pos, current_pos)
        if (best_rank is None) or evaluate_rank(current_rank, best_rank):
            best_rank = current_rank
            ending_pos = current_pos

    return best_rank, ending_pos
    
def get_best_range_by_rank_brute_force(A, get_rank, evaluate_rank):
    best_rank = None
    best_starting_pos = None
    best_ending_pos = None

    for starting_pos in range(len(A)):
        this_rank, ending_pos = get_best_range_from_starting_pos(A, starting_pos, get_rank, evaluate_rank)
        if (best_rank is None) or (this_rank > best_rank):
            best_rank = this_rank
            best_starting_pos = starting_pos
            best_ending_pos = ending_pos

    return best_rank, best_starting_pos, best_ending_pos

For example, to find the two complex numbers closest together, you'd use something like this:

def rank_distance(A, starting_pos, current_pos):
    return abs(A[current_pos] - A[starting_pos])
    
def get_min_distance(current_rank, best_rank):
    return current_rank < best_rank

And to find the maximum distance of two complex numbers, you'd keep the rank and simply change the rank evaluation function to

def get_max_distance(current_rank, best_rank):
    return current_rank > best_rank

Generalizing the One-Pass Sweep function

Now, can we do this for the one-pass sweep? Let's revisit the code:

def fmax_one_pass_sweep(A):

    # minimum price seen so far
    min = None

    max_profit = None
    buy = None
    sell = None

    for index, item in enumerate(A):

        # sell at the minimum - update minimum seen so far
        if (min is None) or (item < min):
            min = item
            buy = index

        # buy at the maximum - update maximum seen so far
        elif (max_profit is None) or (item - min) > max_profit:
            max_profit = item - min
            sell = index

    return max_profit, buy, sell

For example, you could think: why not find the optimum starting position by comparing the distance to a point "at infinity". Doesn't help: imagine that your point at infinity is (0, infinity), and your data lies on (-10, 0) and (10, 0): both have the same rank relative to infinity, and it gives you nil information about the distance between the two points, and whether one or the other is better.


GK, April 28, 2013