Grover’s Algorithm

Section 9.1 Grover’s Algorithm

Let us consider a very simple “search problem.” Suppose that \(f:\{0,\dots,N-1\}\to \{0,1\}\) is a function such that there is exactly one element of the domain, say \(\omega\text{,}\) such that \(f(\omega)=1\text{,}\) and our goal is to locate this element \(\omega\text{.}\) How many times do you need to evaluate the function \(f\) before you can guarantee that you find the value of \(\omega\text{?}\) Well, its quite clear that, in the worst case, it is going to take \(N\) evaluations of the function. Moreover, even if you test the inputs in a random order, it will still take you \(N/2\) evaluations on average. It seems absurd that one could hope for better performance than this.

In this section, we will explain how, as obsurd as it may sound, if \(f\) satisfies a mild assumption (that is typically satisfied in practice) and if we have access to a quantum computer, then we can actually solve this problem with high probability in time \(O(\sqrt{N})\text{.}\) The method for achieving this is known as Grover’s Algorithm. Just as, throughout this course, we explain classical algorithms without explaining how the innards of a modern computer works, we will attempt to explain this algorithm without explaining the quantum mechanics behind it. We will simply describe some of the “rules” that an algorithm needs to satisfy and show that there is an algorithm that follows those rules which solves the problem quickly with high probability. For the sake of full disclosure, the author would like to confess that he knows almost nothing about quantum mechanics or quantum computers, and so he would not be able explain the underlying physics if he wanted to.

At certain points, it will be useful for \(N\) to be a power of 2, which we can always achieve by simply extending the domain of \(f\) to \(\left\{0,\dots,2^{\lceil \log_2(N)\rceil}-1\right\}\) and setting \(f(x):=0\) for any \(x\geq N\text{.}\) So, we let \(n\geq1\) be an integer so that \(N=2^n\text{.}\) Consider the vector space \(\mathbb{C}^N\text{.}\) We associate each point in \(x\in\{0,\dots,N-1\}\) with a vector of an orthonormal basis denoted by \(\left|x\right\rangle\text{.}\) This notation (called “bra-ket” notation) is useful in quantum mechanics, but we will not need need to understand much about how it works for this discussion. We remark that many sources on this topic choose to express the elements of the domain \(\{0,\dots,N-1\}\) as binary vectors; this is just a stylistic choice and it does not affect the underlying mathematics.

Now, let us use our imagination. Inside a quantum computer, there is some kind of a “physical system” where, at any given time, the “state” of that system is represented by a unit vector in \(\mathbb{C}^N\text{.}\) Since the vectors \(\left|x\right\rangle\) for \(0\leq x\leq N-1\) form an orthonormal basis, any possible state of the system can be written as \(\sum_{x=0}^{N-1} \alpha_x\left|x\right\rangle\) where \(\alpha_x\in \mathbb{C}\) and \(\sum_{x=0}^{N-1}|\alpha_x|^2 = 1\text{.}\) This is called a superposition of the states \(\left|x\right\rangle\) for \(x\in \{0,\dots,N-1\}\text{.}\)

Now, here is how a quantum computer works (sort of, I think). At any point, the physical system inside of the quantum computer is in a certain state, and we can manipulate that state by performing certain operations, each of which is treated as taking time \(O(1)\text{.}\) However, the catch is that, unlike a normal computer, we cannot “read out” the state of the system whenever we want. In fact, whenever we measure the state of the system, it immediately collapses to one of the basis vectors \(\left|x\right\rangle\) and we will be unable to recover the original state. The way that the “collapsing” works is probabilistic in nature; if the current state is \(\sum_{x=1}^N \alpha_x\left|x\right\rangle\text{,}\) then the probability that a measurement causes it to collapse to a given basis vector \(\left|x\right\rangle\) is precisely \(|\alpha_x|^2\text{.}\) Also, after the system collapses, you cannot get it back. That is, if you repeatedly measure the state of the system, the output on every subsequent measurement is no longer random; it simply outputs the same basis vector \(\left|x\right\rangle\) over and over. So, if we want our algorithm to output some specific value, say the unique value \(\omega\) in the domain with \(f(\omega)=1\) with high probability, then we need to ensure that our manipulations eventually push the state of the physical system to have the property that the coefficient of \(\left|\omega\right\rangle\) has modulus close to \(1\text{.}\) Moreover, these manipulations are done “blindfolded” in the sense that you cannot see the state of the system at any point (because, if you measure it, it’ll collapse!).

What are these “manipulations” that you are allowed to do? To explain this, we need a few definitions. For a square complex matrix \(U\text{,}\) the adjoint of \(U\text{,}\) denoted \(U^*\text{,}\) is obtained by taking the transpose of \(U\) and then taking the complex conjugate of every entry. A matrix \(U\) is said to be unitary if \(U^*\) is the inverse of \(U\text{;}\) that is, if \(UU^*=U^*U=I\text{.}\) These are basically the square matrices that map unit vectors in \(\mathbb{C}^N\) bijectively to other unit vectors (i.e. they are complex isometries). Let \(U(N)\) be the group of all unitary \(N\times N\) matrices. The elements of \(U(N)\) represent the possible quantum gates and the “allowed manipulations” that we referred to earlier consist of multiplying the current state of the system by an element of \(U(N)\text{.}\)

Example 9.1.1.

A very simple example of a unitary matrix is any diagonal matrix \(U\) such that each diagonal entry is a complex number of modulus one. It is easy to see that any such matrix satisfies \(UU^*=U^*U=I\) and so such a matrix is, indeed, unitary. For any \(x\in\{0,\dots,N-1\}\text{,}\) we define \(U_x\) to be a matrix of this type where the \((x+1)\)th diagonal entry is \(-1\) and the rest are \(1\text{.}\) Multiplying by this matrix has the effect of flipping the vector across the hyperplane perpendicular to \(\left|x\right\rangle\text{.}\)

The next example introduces a particularly important quantum gate.

Example 9.1.2.

Let us consider the simple case that \(n=1\) and so \(N=2\text{.}\) In this case, the states are all of the superpositions of \(\left|0\right\rangle\) and \(\left|1\right\rangle\text{.}\) The Hadamard gate is represented by the following matrix:

\begin{equation*} H_1=\frac{1}{\sqrt{2}}\begin{bmatrix}1 \amp 1\\ 1 \amp -1\end{bmatrix}. \end{equation*}

Note that this matrix is unitary because \(H_1^*=H_1\) and one can check that \(H_1^2 = I\text{.}\)

Now, given a superposition \(\alpha\left|0\right\rangle + \beta\left|1\right\rangle\) with \(|\alpha|^2+|\beta|^2=1\text{,}\) multiplying by \(H_1\) yields the new superposition \(\frac{\alpha+\beta}{\sqrt{2}}\left|0\right\rangle + \frac{\alpha-\beta}{\sqrt{2}}\left|1\right\rangle\text{.}\) In particular, applying it to \(\left|0\right\rangle\) yields \(\frac{1}{\sqrt{2}}\left|0\right\rangle+ \frac{1}{\sqrt{2}}\left|1\right\rangle\text{,}\) which is a “balanced” superposition of the two basis vectors.

Let us now start to prepare to describe Grover’s Algorithm. When first faced with this search problem, we have no idea which of the inputs gives an output of \(1\text{,}\) and so we would like to start with a state that is as “unbiased” as possible. That is, we want to start with the state \(\frac{1}{\sqrt{N}}\sum_{x=0}^{N-1}\left|x\right\rangle\) which is a balanced superposition of all of our basis vectors. Recall that we are assuming that \(N=2^n\text{.}\) When we start our algorithm, we assume that our orthonormal basis is chosen so that the initial state of the system is \(\left|0\right\rangle\text{.}\) Then, as is proved in Exercise 2, the state \(\frac{1}{\sqrt{N}}\sum_{x=0}^{N-1}\left|x\right\rangle\) can be reached by applying a generalization of the Hadamard gate which is represented by a unitary matrix \(H_n\text{.}\) This matrix has the properties \(H_n^*=H_n\) and \(H_n^2=I\) just as the matrix \(H_1\) from Example 9.1.2 does.

Okay, so, after applying \(H_n\text{,}\) we find ourselves in the uniform superposition \(\frac{1}{\sqrt{N}}\sum_{x=0}^{N-1}\left|x\right\rangle\) which represents our current “best guess” for the element \(\omega\) of the domain such that \(f(\omega)=1\) (i.e., as far as we know, it is equally likely to be any element of the domain). Our goal is to apply quantum gates (i.e. multiply by unitary operators) to eventually reach a state where the coefficient of \(\left|\omega\right\rangle\) is very close to \(1\text{.}\) At the start of this discussion, we mentioned that we would need some mild assumptions on the function \(f\text{.}\) The assumption that we require is that, if you input a value \(x\) into \(f\text{,}\) then there is a reasonably efficient algorithm for evaluating the output \(f(x)\text{.}\) For example, if you take any problem in NP (e.g. the 3-colouring problem for graphs) where the elements of \(\{0,\dots,N-1\}\) represent potential certificates (e.g. the mappings of \(V(G)\) to \(\{1,2,3\}\)) and let \(f\) be a function where \(f(x)=1\) if \(x\) is a valid certificate and \(f(x)=0\) otherwise, then, by definition of NP, there is an efficient algorithm for computing the value of \(f(x)\text{.}\) This algorithm can be written down as an explicit small circuit of logic gates (e.g. for the 3-colouring problem, this would be a for loop that checks the colours of the endpoints of each edge). By representing each of these logic gates as unitary matrices in the right way, and using the fact that there is a unique \(\omega\) with \(f(\omega)=1\text{,}\) one can get a representation of the matrix \(U_\omega\) (as in Example 9.1.1) as a product of a small number of unitary matrices. Crucially, this does not require us to know what \(\omega\) is! It simply requires us to know how the function \(f\) is computed for a general input \(x\) and to have unitary matrices which simulate the steps of that algorithm. Note that, even in the classical setting, the running time for a brute force algorithm will depend on how long it takes to evaluate \(f(x)\) and so this should not be seen as a limitation of quantum computing.

We now have enough background to describe Grover’s Algorithm.

Algorithm 9.1.3. Grover’s Algorithm.

Initialization: We assume that \(N=2^n\) for some integer \(n\geq1\text{.}\) The initial state of the system is \(\left|0\right\rangle\text{.}\) Let \(\theta:=\arcsin\left(\frac{1}{\sqrt{N}}\right)\) and let \(r\) be a non-negative integer chosen so that \(\theta+2\theta r\) is as close to \(\frac{\pi}{2}\) as possible.

Multiply the initial state \(\left|0\right\rangle\) by \(H_n\) to obtain the uniform state \(\left|s_0\right\rangle:= H_n\left|0\right\rangle = \frac{1}{\sqrt{N}}\sum_{x=0}^{N-1}\left|x\right\rangle\text{.}\)
for \(i=1,\dots,r\text{:}\)
- Multiply the current state \(\left|s_{i-1}\right\rangle\) by \(H_n(-U_0)H_nU_\omega\) to obtain a new state \(\left|s_{i}\right\rangle := H_n(-U_0)H_nU_\omega\left|s_{i-1}\right\rangle\text{.}\)
Measure the final state \(\left|s_r\right\rangle\) of the system.

Analyzing the running time is easy. The Taylor series of \(\sin(\theta)\) around \(\theta=0\) shows us that \(\sin(\theta)\approx \theta\) when \(\theta\) is close to zero. Since \(\arcsin(y)\) is the inverse of \(\sin(\theta)\) on the interval \([-1,1]\text{,}\) it must also hold that \(\arcsin(y)\approx y\) when \(y\) is close to zero. So, when \(N\) is large, we have the quantity \(\theta\) defined during the initialization of the algorithm is approximately \(\frac{1}{\sqrt{N}}\text{.}\) Therefore, for large \(N\text{,}\)

\begin{equation*} r\approx \left(\frac{1}{2\theta}\right)\left(\frac{\pi}{2}-\theta\right) \approx \frac{\pi}{4}\sqrt{N}. \end{equation*}

So, if we treat each multiplication by a unitary operator as taking time \(O(1)\text{,}\) then the total running time is \(O(\sqrt{N})\text{.}\)

Now, we need to justify that this algorithm solves the problem that we set out to solve with high probability when \(N\) is large and that it does so efficiently. A rough idea of this that is useful for building intuition can be found in the video included at the end of this section; here, we will give the details. To this end, we need to get a handle on the effect of multiplying a state \(\left|s_{i-1}\right\rangle\) by \(H_n(-U_0)H_nU_\omega\) for \(i\geq1\text{.}\) One simple fact that we will need in this analysis is that, for any \(x\in\{0,\dots,N-1\}\text{,}\) the coefficient of \(\left|0\right\rangle\) in \(H_n\left|x\right\rangle\) is \(\frac{1}{\sqrt{N}}\text{.}\) We also need the fact that \(H_n^2=I\text{,}\) which was mentioned earlier. Let’s start by analyzing the first step. The effect of multiplying \(\left|s_{0}\right\rangle\) on the left by these four matrices, in order, is as follows:

\begin{align*} \left|s_{0}\right\rangle\amp\stackrel{\text{multiply by }U_\omega}{\longrightarrow}\amp \left|s_{0}\right\rangle - \frac{2}{\sqrt{N}}\left|\omega\right\rangle\\ \amp\stackrel{\text{multiply by }H_n}{\longrightarrow}\amp \left|0\right\rangle - \frac{2}{\sqrt{N}}H_n\left|\omega\right\rangle\\ \amp\stackrel{\text{multiply by }-U_0}{\longrightarrow}\amp \left|0\right\rangle + \frac{2}{\sqrt{N}}H_n\left|\omega\right\rangle - \frac{4}{N}\left|0\right\rangle\\ \amp\stackrel{\text{multiply by }H_n}{\longrightarrow}\amp \left|s_0\right\rangle + \frac{2}{\sqrt{N}}\left|\omega\right\rangle - \frac{4}{N}\left|s_0\right\rangle\amp=\left(1-\frac{4}{N}\right)\left|s_0\right\rangle + \frac{2}{\sqrt{N}}\left|\omega\right\rangle \end{align*}

So, therefore, \(\left|s_1\right\rangle = \left(1-\frac{4}{N}\right)\left|s_0\right\rangle + \frac{2}{\sqrt{N}}\left|\omega\right\rangle\) is a vector which points slightly more strongly in the direction of \(\left|\omega\right\rangle\) than our original “best guess” \(\left|s_0\right\rangle\text{,}\) which means that we are making progress! If we simply observed the state at this point, then the probability of collapsing to the state \(\left|\omega\right\rangle\) would be \(\left(\left(1-\frac{4}{N}\right)\frac{1}{\sqrt{N}} + \frac{2}{\sqrt{N}}\right)^2\approx \frac{9}{N}\) and so we have increased our chance of success by a factor of \(9\text{.}\) Still, if \(N\) is large, then our odds are pretty bad, and so we need to keep improving them.

To analyze the later steps of the algorithm, it is useful to show that plane spanned by \(\left|s_0\right\rangle\) and \(\left|\omega\right\rangle\) is invariant under multiplication by \(H_n(-U_0)H_nU_\omega\text{.}\) Above, we already saw that applying these matrices to \(\left|s_0\right\rangle\) maintained containment within this plane, and so we just need to show the same for \(\left|\omega\right\rangle\text{.}\) We have

\begin{align*} \left|\omega\right\rangle\amp\stackrel{\text{multiply by }U_\omega}{\longrightarrow}\amp -\left|\omega\right\rangle\\ \amp\stackrel{\text{multiply by }H_n}{\longrightarrow}\amp -H_n\left|\omega\right\rangle\\ \amp\stackrel{\text{multiply by }-U_0}{\longrightarrow}\amp -\frac{2}{\sqrt{N}}\left|0\right\rangle + H_n\left|\omega\right\rangle\\ \amp\stackrel{\text{multiply by }H_n}{\longrightarrow}\amp -\frac{2}{\sqrt{N}}\left|s_0\right\rangle + \left|\omega\right\rangle \end{align*}

Thus, indeed, the plane spanned by \(\left|s_0\right\rangle\) and \(\left|\omega\right\rangle\) is invariant under multiplication by \(H_n(-U_0)H_nU_\omega\text{.}\) This is rather convenient as it implies that all of the vectors \(\left|s_i\right\rangle\) for \(i\geq1\) lie in this plane and so we can reduce everything down to a two dimensional problem. Taking the vectors \(\left|s_0\right\rangle\) and \(\left|\omega\right\rangle\) to be a basis for the plane that they span, the above calculations allow us to represent the restriction of \(H_n(-U_0)H_nU_\omega\) to that plane by the matrix

\begin{equation*} \begin{bmatrix}1-\frac{4}{N}\amp -\frac{2}{\sqrt{N}}\\ \frac{2}{\sqrt{N}}\amp 1\end{bmatrix}. \end{equation*}

which can be expressed in the form \(MJM^{-1}\) where

\begin{equation*} M=\begin{bmatrix}e^{-i\theta}\amp e^{i\theta}\\ i \amp -i\end{bmatrix}, \end{equation*}

\begin{equation*} M^{-1}=\frac{1}{-ie^{-i\theta}-ie^{i\theta}}\begin{bmatrix}-i\amp -e^{i\theta}\\ -i \amp e^{-i\theta}\end{bmatrix}=\frac{1}{2i\cos(\theta)}\begin{bmatrix}i\amp e^{i\theta}\\ i \amp -e^{-i\theta}\end{bmatrix}, \text{ and} \end{equation*}

\begin{equation*} J=\begin{bmatrix}e^{-2i\theta}\amp 0\\0\amp e^{2i\theta}\end{bmatrix}. \end{equation*}

This decomposition will be verified in Exercise 3. So, the \(r\)th power is

\begin{equation*} M^{-1}\begin{bmatrix}e^{-2i\theta r}\amp 0\\0\amp e^{2i\theta r}\end{bmatrix} M \end{equation*}

Now, let us see what happens when we multiply the \(r\)th power of the matrix written above to the initial vector \(\begin{bmatrix}1\\0\end{bmatrix}\text{,}\) where this vector is viewed in terms of our basis \(\left|s_0\right\rangle\) and \(\left|\omega\right\rangle\) of the plane and therefore represents our initial balanced superposition \(\left|s_0\right\rangle\text{.}\) We have

\begin{align*} \begin{bmatrix}1\\0\end{bmatrix}\amp\stackrel{\text{multiply by }M^{-1}}{\longrightarrow}\amp \begin{bmatrix}\frac{1}{2\cos(\theta)}\\\frac{1}{2\cos(\theta)}\end{bmatrix}\amp=\frac{1}{2\cos(\theta)}\begin{bmatrix}1\\1\end{bmatrix}\\ \amp\stackrel{\text{multiply by }J}{\longrightarrow}\amp \frac{1}{2\cos(\theta)}\begin{bmatrix}e^{-2i\theta r}\\e^{2i\theta r}\end{bmatrix}\\ \amp\stackrel{\text{multiply by }M}{\longrightarrow}\amp \frac{1}{2\cos(\theta)}\begin{bmatrix}e^{-i\theta-2i\theta r} +e^{i\theta +2i\theta r}\\ ie^{-2i\theta r} -ie^{2i\theta r}\end{bmatrix}\amp = \frac{1}{2\cos(\theta)}\begin{bmatrix}2\cos(\theta+2\theta r)\\ 2\sin(2\theta r)\end{bmatrix} \end{align*}

At this point, we recall that we chose \(r\) so that \(\theta+2\theta r\) is very close to \(\pi/2\text{.}\) Also, recall that \(\theta\approx \frac{1}{\sqrt{N}}\text{.}\) So, the above vector is very close to \(\begin{bmatrix}0\\1\end{bmatrix}\) which means that the coefficient of \(\left|\omega\right\rangle\) in \(\left|s_r\right\rangle\) is close to \(1\text{.}\) So, when we observe the state \(\left|s_r\right\rangle\text{,}\) the probability of collapsing to \(\left|\omega\right\rangle\) is close to \(1\text{.}\) Thus, Grover’s Algorithm gives is an algorithm with running time \(O(\sqrt{N})\) which has a very high probability of success. If you want to increase the probability further, then you may simply run the algorithm several times and choose the value of \(\omega\) which the algorithm outputs most frequently.

Here is a YouTube video related to this section:

Here are two excellent videos by Grant Sanderson at 3Blue1Brown on quantum algorithms and Grover’s Algorithm. The writing of this section was heavily inspired by his exposition in these videos.

Prev Top Next