Algorithm Time Complexity: Worst-case, Average and Amortized

Most people look up algorithm time and memory complexity, and find something like O(n) or O(nlog(n)). But what does this actually mean? There are actually quite a few finer points at play. Let's look at some definitions in this blog post to discover the finer points. We will define some basics to refresh our memory too!

Recap On Big O Notation

Quick recap on the definition of the Big O notation this is used in this post, Wikipedia mentions in computer science:

Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity
So, f(x)=O(g(x)) if there exist positive integer numbers M and n0 such that f(n)<=Mg(n) for all n>=n0.

Computational Complexity Theory

Just before diving in to the post, let's define this, from Wikipedia:

Computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

Of note:

the mention of resource usage - in this blog post we will focus on time usage.
the use of the word "class" - later in the link, examples of classes such as P and NP are given, which we won't go in to here, other than for the next point!
the mention of the P vs NP problem in the above link as being dedicated to the field of computational complexity. It is regarded as the "biggest unsolved problem in computer science", with a lot of online info that we will not get in to here.

So this all sounds very.... "complex".... is it really needed to understand what we mean when we say an algorithm is time complexity O(n)? I think it is - because that O(n) is usually the average or amortized result - where each of "average" and "amortized" are a method of analysing a algorithm's computational complexity. These are the main focus for the article - but first - another definition!

Time Complexity

Again, before getting in to too much detail, let's keep defining things. Wikipedia defines time complexity as:

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform.

Things of note here:

the mention of "computer time" rather than just time - so, the time is an estimate of an algorithm running in the real world on a computer.
the mention on "fixed time", which simplifies things so that in theory we can assume it is constant, to get a theoretical value for the real world practical "computer time". We could also estimate it in theory if it was not fixed time - but it just simplifies "counting things", mainly counting and summing the time things take.
the mention of it being "commonly estimated" by counting some stuff. This is interesting - as there are a lot of ways to analyse time complexity - worst case, average, best case and amortized for example.
the omission of any mention of "the number of inputs". Mentioning this, would likely lead to defining how something is estimated (eg by looking at inputs, how likely they are to occur etc), rather than saying that things are "commonly estimated" by counting stuff to get the time complexity.

So let's look at what stuff might be counted, and what ways there are to get time complexity.

All of the Complexities

Methods for analysing a algorithm's complexity include:

worst case
best case
average case - here, as noted later, you need to make some assumptions about average inputs. Note that sometimes in casual language, the "average" is deemed to be the answer when all inputs have the same likelihood of occurring. While "expected case" takes in to account the likelihood of an input to occur. Both of these are the average case, with different probabilities of inputs.
amortized - cost per operation, evaluated over a sequence of operations (i.e. the sum of the cost for all operations in a sequence, divided by the number of operations) - here you have to choose a sequence and sum its costs. Probability is not involved - it guarantees the average performance of each operation in the worst case.

Let's use an example of an algorithm that:

could only act on 3 inputs
in time 1x, 2x, and 3x respectively

Then:

the worst case is 3x
the best case is 1x
the average case - where all inputs have a likelihood of 0.1, 0.6 and 0.3 respectively of occurring - is (0.1*1x + 0.6*2x + 0.3*3x)/3 = 0.73x
the average case - where all inputs are equally likely to occur - is (1x + 2x + 3x)/3 = 2x
the amortized case is equal to the average case where all inputs are equally likely to occur in this simple example. This is because the sequence used is the 3 cases.

The worst and best case are quite intuitive. Though the average and amortized have some finer points to note, in order to understand them, by their actual definition. Let's define those and see why we might use them.

Amortized Analysis

As mentioned, amortized analysis is a method for analysing an algorithm's complexity (and there are generally three methods of amortized analysis - the aggregate method, the accounting method and the potential method, which we will not get in to here).

Of note in the above link:

The motivation for amortized analysis is that looking at the worst-case run time per operation, rather than per algorithm, can be too pessimistic.
While certain operations for a given algorithm may have a significant cost in resources, other operations may not be as costly. The amortized analysis considers both the costly and less costly operations together over the whole series of operations of the algorithm. This may include accounting for different types of input, length of the input, and other factors that affect its performance.

The 1985 paper Amortized Computational Complexity addressed:

"the need for a more useful form of analysis than the common probabilistic methods used. Amortization was initially used for very specific types of algorithms, particularly those involving binary trees and union operations. However, it is now ubiquitous and comes into play when analyzing many other algorithms as well."

From the paper itself:

We shall adapt this term ("to amortize") to computational complexity, meaning by it "to average over time" or, more precisely, "to average the running times of operations in a sequence over the sequence."
The following observation motivates our study of amortization: In many uses of data structures, a sequence of operations, rather than just a single operation, is performed, and we are interested in the total time of the sequence, rather than in the times of the individual operations.
A worst-case analysis, in which we sum the worst-case times of the individual operations, may be unduly pessimistic, because it ignores correlated effects of the operations on the data structure.
On the other hand, an average-case analysis may be inaccurate, since the probabilistic assumptions needed to carry out the analysis may be false.
In such a situation, an amortized analysis, in which we average the running time per operation over a (worst-case) sequence of operations, can yield an answer that is both realistic and robust.

So recapping on why this might be useful in the real world - in algorithms where a list is resized, or a binary tree is re-balanced, there may be one step in a sequence of steps that is expensive - but the other steps may be very cheap. It is in such cases that amortized analysis is useful.

Amortized Analysis Example with a Stack

This all sounds great. But how exactly is this done? The paper goes on to give an example which is:

Consider the manipulation of a stack by a sequence of operations composed of two kinds of unit-time primitives: push, which adds a new item to the top of the stack, and pop, which removes and returns the top item on the stack.
We wish to analyze the running time of a sequence of operations, each composed of zero or more pops followed by a push.
Assume we start with an empty stack and carry out m such operations.

So for example, with m=4 operations we may have:

1st operation: 0 pops, 1 push - 1 time unit - 1 in stack
2nd operation: 0 pops, 1 push - 1 time unit - 2 in stack
3rd operation: 0 pops, 1 push - 1 time unit - 3 in stack
4th operation: 3 pops, 1 push - 4 time units - 1 in stack

Back to the paper:

A single operation in the sequence can take up to m time units, as happens if each of the first m-1 operations performs no pops and the last operation performs m-1 pops.
However, altogether the m operations can perform at most 2m pushes and pops, since there are only m pushes altogether and each pop must correspond to an earlier push.

This example shows bounds when things are placed in a context - like in the context of a stack where there can only be so many pushes and pops due to the nature of a stack. Reflecting on this:

the worst case per operation is m time units.
the amortized time analysis here is that, when we "average the running times of operations in a sequence over the sequence" then we get the running times of operations in a sequence being 2m pushes and pops i.e. time units, and the average of that being 2m/m=O(1).
This "calculation" as mentioned depends on the context of a stack.

Amortized Analysis Example with Adding to a List

Another classic example of a real world example is of adding to a list say in C# which is variable length, which behind the scenes is implemented as a fixed length array. If the allocated length is all used up, then "behind the scenes" the algorithm needs to create a new fixed length array that can fit the length of the list in it.

For adding an element to a list of length n which needs to be resized at sizes that are powers of 2:

the worst case is O(n) if the array is full (as another array of length say 2n needs to be allocated and copied to n times)

Considering sequences of these worst cases (where lengths are powers of 2), where the time unit is also n, then for:

length 1: complexity is 1 time unit
length 2: complexity is 2 time units - as array needs to be copied to resize it
length 4: complexity is 4 time units - as array needs to be copied to resize it

The trick with amortized analysis is in the summing of the sequence - as above with the context of the stack, we knew that it could have at most 2m operations - here with the array, we sum the complexities and use the formula for the sum of the numbers in a geometric progression:

so the amortized analysis is the sum over the operations which is (1+2+4+....+2^k)/n = (2*2^k-1)/n <=(2*n-1)/n which when n goes to infinity is 2 which is O(1).

It is interesting that we have summed over all the worst cases - and gotten O(1). It is due to the nature of the worst cases having this rule about powers of 2, and being able to sum it as a geometric progression, that we were able to do this.

Average Analysis

Having delved in to sequences and averaging those - the question remains - what would an average analysis be? It should be easy right, it is "just the average". But - it also has a context, like which inputs there are, and you can take in to account how likely these inputs are to occur - so it is also not super simple.

To define it, Wikipedia states:

In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs.

It also notes:

Average-case analysis requires a notion of an "average" input to an algorithm, which leads to the problem of devising a probability distribution over inputs. Alternatively, a randomized algorithm can be used. The analysis of such algorithms leads to the related notion of an expected complexity.

So to compute the average time complexity, you first have to decide an average input.

The answers for average complexity can be the same as for amortized complexity. But there are some examples where this is not the case.

Memory Complexity

This article focussed on time complexity - but an analysis can also of course be done for memory usage for an algorithm.

Application Example

Now that a few things have been defined, where would we see these terms? One example is documentation - like here, where the table of algorithmic complexity of operations on collections in C# is stated. The heading "amortized" is used on the left for non immutable collections - like the list example above. Contrast this to just the heading of "complexity" for the immutable collections on the right - why would the heading not be "amortised" here? Well, immutable collections are just created once - and there is no series, so it makes sense to not need to average over a series.