Algorithms and Running Time

Section 1.2 Algorithms and Running Time

The focus of this course is on developing methods to solving discrete optimization problems as quickly as possible. The “methods” that we are referring to are known as algorithms. Of course, in order to compare the speed of two algorithms, we need a method of estimating their running time. A standard way to do this is to estimate the running time in terms of the number of “elementary operations” performed by the algorithm in the “worst case” on an input of a certain size.

Let us unpack this statement and illustrate it with some simple examples. For the purposes of these notes, an elementary operation is one of the following:

any arithmetic operation (addition, subtraction, multiplication or division),
the comparing of two numbers (to determine which of them is larger),
a branching instruction (e.g. “go back to Step 2”).

Typically, the first two of these operations will be the most important to keep track of, as it is somewhat rare for an algorithm to have a number of branching instructions that far exceeds the number of arithmetic operations or comparisons.

Note that, while we will be treating each of these elementary operations as taking the same amount of time (one step), in practice, different operations may take different amounts of time to perform. For example, comparing the relative sizes of two 3-digit numbers will naturally take less time than multiplying two 3000-digit numbers. However, for convenience, it is reasonable to assume that all elementary operations take the same amount of time.

Let us now provide an example of an algorithm and analyze its running time.

Example 1.2.1. InsertionSort.

Consider the problem of sorting a list \(x_1,\dots,x_n\) of numbers in increasing order; this is equivalent to the problem of sorting words in alphabetical order which was discussed in the previous section. Here is a simple algorithm for achieving this, known as InsertionSort. For each \(i\geq1\text{,}\) the \(i\)th step of the algorithm will start with a list \(L=(y_1,\dots,y_{i-1})\) consisting of all of the numbers \(x_1,\dots,x_{i-1}\) in the correct order, and it will insert \(x_i\) into the appropriate location in \(L\text{.}\) So, when \(i=1\text{,}\) we start with the empty list \(L=()\) and insert \(x_1\) to yield the list \(L=(x_1)=(y_1)\text{.}\) In general, for \(i\geq2\text{,}\) we find the smallest index \(j\) such that \(y_{j+1}\geq x_i\) and insert \(x_i\) between \(y_j\) and \(y_{j+1}\text{;}\) if no such index exists, then we insert \(x_i\) at the end of the list. In the end, our output will be the list \((y_1,\dots,y_n)\) which will consist of all of the numbers \(x_1,\dots,x_n\) in increasing order.

Now, in order to judge the speed of this algorithm, we need to estimate the maximum number of elementary operations it will take. Clearly, the algorithm does not use any arithmetic operations. Also, the number of branching instructions is \(n\text{,}\) since it needs to “go back to the start” exactly \(n-1\) times and the final step of the algorithm consists of the branching instruction which stops the algorithm and outputs the result. On the \(i\)th step of the algorithm, the number \(x_i\) is compared with at most \(i-1\) other numbers in the list. So, the total number of comparisons is at most

\begin{equation*} \sum_{i=1}^n(i-1) = \frac{n(n-1)}{2}. \end{equation*}

So, the total number of elementary operations is \(\frac{n(n-1)}{2}+n = \frac{n(n+1)}{2}\text{.}\)

Subsection 1.2.1 Big O Notation

The analysis of the running time in Example 1.2.1 was fairly precise. This was easy to do because the algorithm that we were analyzing was very simple. Often, when dealing with more complex algorithms, it is preferrable to give a more rough estimate of the running time which suppresses constant factors and lower order asymptotic terms. The general philosophy here is that, while constant factors and lower order terms may have a large effect on the running time for small inputs, they do not affect the way that the running time “scales up” as the size of the input grows. For this, it is useful to use “Big O” notation.

Definition 1.2.2. Big O Notation.

Let \(f,g:\mathbb{N}\to\mathbb{N}\text{.}\) We write

\(f=O(g)\) to mean that there exists \(C>0\) such that \(f(n)\leq Cg(n)\) for all \(n\in \mathbb{N}\text{,}\)
\(f=\Omega(g)\) to mean that there exists \(c>0\) such that \(f(n)\geq cg(n)\) for all \(n\in \mathbb{N}\text{,}\)
\(f=\Theta(g)\) to mean that there exist \(C,c>0\) such that \(cg(n)\leq f(n)\leq Cg(n)\) for all \(n\in\mathbb{N}\text{,}\)
\(f=o(g)\) or \(f\ll g\) to mean that \(\lim_{n\to\infty}\frac{f(n)}{g(n)}=0\text{,}\)
\(f=\omega(g)\) or \(f\gg g\) to mean that \(\lim_{n\to\infty}\frac{f(n)}{g(n)}=\infty\text{,}\)
\(f=(1+o(1))g\) or \(f\sim g\) to mean that \(\lim_{n\to\infty}\frac{f(n)}{g(n)}=1\text{.}\)

In the language of Big O notation, we would say that the InsertionSort algorithm has running time \(O(n^2)\) since the “leading” term of the function \(\frac{n(n+1)}{2}=\frac{1}{2}n^2 + \frac{1}{2}n\) is quadratic. Of course, we could also express the running time (less precisely) as \(o(n^3)\) since \(\lim_{n\to\infty}\frac{n(n+1)/2}{n^3}=0\) or (more precisely) as \((1+o(1))n^2/2\) since \(\lim_{n\to\infty}\frac{n(n+1)/2}{n^2/2}=1\text{.}\) Throughout this course, we will often speak in the language of asymptotic notation, and so you will need to become comfortable with it.

To get more practice with Big O notation as a tool for estimating running time, let us present and analyze another simple sorting algorithm.

Example 1.2.3. MergeSort.

As in Example 1.2.1, our goal is to put a list \(x_1,\dots,x_n\) into increasing order. The way that the MergeSort algorithm achieves this is as follows. If \(n\leq 1\text{,}\) then the list is already sorted and so there is nothing to do. Otherwise, split the list into two sublists \(x_1,\dots,x_{\lfloor n/2\rfloor}\) and \(x_{\lfloor n/2\rfloor+1},\dots,x_n\text{.}\) Apply the MergeSort algorithm on each of the sublists to obtain ordered sequences \(y_1,\dots,y_{\lfloor n/2\rfloor}\) and \(y_{\lfloor n/2\rfloor+1},\dots,y_n\text{.}\) Finally, we “merge” these sequences to obtain the final ordered list. To do this, we initialize \(i_1=1\) and \(i_2=\lfloor n/2\rfloor+1\text{.}\) Then, while \(y_{i_1}\leq y_{i_2}\text{,}\) we list the element \(y_{i_1}\) and increase the index \(i_1\) by one. Once this inequality is no longer true, while \(y_{i_1}\geq y_{i_2}\text{,}\) we list the element \(y_{i_2}\) and increase the index \(i_2\) by one, and so on.

Now, let us analyze the running time of this algorithm. If \(T(n)\) is the amount of time required to run the algorithm on an input of size \(n\text{,}\) then (ignoring branching instructions), we have

\begin{equation*} T(n) \leq T(\lfloor n/2\rfloor) + T(\lceil n/2\rceil) + n \end{equation*}

since the number of comparisions performed when merging the two sequences is at most \(n\text{.}\) Now, we claim that the above inequality leads to a running time of \(O(n\log(n))\text{.}\) Indeed, the number of times that we can divide \(n\) by two before reaching a value that is less than or equal to 1 is \(\lceil\log_2(n)\rceil\text{.}\) So, this means that the “depth” of the recursion in this algorithm is \(O(\log(n))\text{.}\) Each time we recurse, we accumulate an additional term of \(n\text{.}\) For example,

\begin{equation*} T(n) \leq T(\lfloor n/2\rfloor) + T(\lceil n/2\rceil) + n \end{equation*}

\begin{equation*} \leq \left(T(\lfloor\lfloor n/2\rfloor/2\rfloor) +T(\lceil\lfloor n/2\rfloor/2\rceil) + \lfloor n/2\rfloor\right) + \left(T(\lfloor\lceil n/2\rceil/2\rfloor) + T(\lceil\lceil n/2\rceil/2\rceil) + \lceil n/2\rceil\right) + n \end{equation*}

\begin{equation*} = T(\lfloor\lfloor n/2\rfloor/2\rfloor) +T(\lceil\lfloor n/2\rfloor/2\rceil) + T(\lfloor\lceil n/2\rceil/2\rfloor) + T(\lceil\lceil n/2\rceil/2\rceil) + 2n \end{equation*}

and so on. So, putting this together, we get that the running time is \(O(n\log(n))\text{.}\) Note that this is superior to the InsertionSort algorithm described earlier, since \(n\log(n)=o(n^2)\text{.}\)

MergeSort is often referred to as a “divide-and-conquer” algorithm due to the fact that it is based on splitting the data into pieces, solving the problem on each piece separately, and combining the solutions. The idea of divide-and-conquer will come up frequently when we talk about dynamic programming in Section 5.1.

Here is a YouTube video related to this section:

Prev Top Next