Abstract
We investigate the sample complexity of Hamiltonian simulation: how many copies of an unknown quantum state are required to simulate a Hamiltonian encoded by the density matrix of that state? We show that the procedure proposed by Lloyd, Mohseni, and Rebentrost [Nat. Phys., 10(9):631–633, 2014] is optimal for this task. We further extend their method to the case of multiple input states, showing how to simulate any Hermitian polynomial of the states provided. As applications, we derive optimal algorithms for commutator simulation and orthogonality testing, and we give a protocol for creating a coherent superposition of pure states, when given sample access to those states. We also show that this samplebased Hamiltonian simulation can be used as the basis of a universal model of quantum computation that requires only partial swap operations and simple singlequbit states.
Introduction
Much work has been done on the time and query complexity of Hamiltonian simulation when given a classical description or black box description of the Hamiltonian. Lloyd provided the first formal results on simulation, considering Hamiltonians that consist of sums of noncommuting terms.^{1} Other lines of research have focused on simulating sparse Hamiltonians, with a long sequence of work recently culminating in an optimal algorithm^{2} (see ref. 3 for a more complete history of work in this field).
In this work, we approach the problem of Hamiltonian simulation from a slightly different perspective. Rather than given a classical description or blackbox access to a Hamiltonian H, we consider the problem of simulating H when given many copies of a quantum state ρ that encodes the Hamiltonian to be simulated. In particular, we assume that
for some constant \(c\in {\mathbb{R}}\) such that \(H + c {\mathbb{1}} \) is positive semidefinite and nonzero. In that case, ρ itself is positive semidefinite and Tr ρ = 1, so ρ is a valid density matrix. Note that the Hamiltonian dynamics of H and ρ are equivalent up to an overall phase and time scaling. Moreover, since the Hamiltonian H in Eq. (1) can be arbitrary, any unitary can in fact be expressed as e ^{−iρt} for an appropriately chosen state ρ and time t.
This modified version of the original Hamiltonian simulation problem is what we call samplebased Hamiltonian simulation: given one copy of an unknown state σ and n copies of an unknown state ρ, implement the following map:
where t is the desired evolution time. We also allow for some error in the final state—we denote by δ the trace distance ^{4} between the state that is output by the protocol and the ideal state e ^{−iρt} σe ^{iρt}. This problem was first considered in ref. 5, where the authors give a simple protocol, which we call the LMR protocol (LMR comes from the author’s initials: Lloyd, Mohseni, and Rebentrost), for approximately implementing the unitary e ^{−iρt} using many copies of ρ. Their protocol is based on a partial swap operation that can also be considered as a finitedimensional analog of a beamsplitter.^{6} An interesting feature of the LMR protocol is that it is agnostic with regard to ρ. In the spirit of ^{7, 8} this suggests interpreting ρ as a “quantum software state”.
The main motivation for samplebased Hamiltonian simulation in ref. 5 is to perform principal component analysis of ρ. They do this by performing phase estimation on the unitary e ^{−iρ}. (We note in Supplementary Information Section C that a slightly more careful analysis gives a polynomial improvement in sample complexity over the complexity given in ref. 5 for performing phase estimation, which is a subroutine for principal component analysis). The LMR protocol has applications to many problems in machine learning, e.g., refs 5, 9,10,11.
In this paper, we ask the following question: given t and δ, what is the minimum n (number of copies of ρ) necessary to implement the unitary e ^{−iρt} on an unknown state σ to trace distance at most δ? We call this the sample complexity of Hamiltonian simulation.
While the LMR protocol acts with each copy of ρ sequentially, perhaps one could achieve better performance by acting with a global operation?^{12} For example, recent nearoptimal tomographic protocols have relied on performing global operations (like the Schur transform) on many copies of the unknown state.^{13, 14} Along those lines, perhaps one could do better than LMR by applying tomographic protocols to get an estimate \(\hat{\rho }\) of ρ from the n copies of ρ, and then evolve according to \({e}^{i\hat{\rho }t}\).
On the contrary, however, we show that LMR performs asymptotically better than any tomographic strategy (‘LMR protocol vs. state tomography’), and performs samplebased Hamiltonian simulation with asymptotic optimality in both t and δ simultaneously (‘LMR protocol is optimal’). We additionally show LMR is optimal in δ when restricting to pure states, and provide a sampleoptimal algorithm for a variant of Grover’s search. In ‘Generalized LMR for simulation of Hermitian polynomials’, we discuss the sample complexity of more complex Hamiltonians that depend on multiple states. For example, we show how to simulate the Hamiltonians given by any Hermitian polynomial (i.e. any element of the Jordan–Lie algebra^{15}) generated by states ρ _{1},…, ρ _{ K } to which we are given sample access. As applications of this result, we show how to simulate the commutator i[ρ _{1}, ρ _{2}] and anticommutator {ρ _{1}, ρ _{2}} of two states ρ _{1} and ρ _{2}, and how to simulate any real linear combination of states ρ _{1},…, ρ _{ K } when given access to many copies of those states. We prove the optimality of the commutator, anticommutator, and linear combinations protocols. In ‘Applications of commutator simulation’, we give applications of commutator simulation to orthogonality testing and quantum state addition. In ‘Universality of LMR’, we show how to use samplebased Hamiltonian simulation to implement a universal model of quantum computation using only partial swaps and a stream of input qubits initialized in 0〉 and +〉.
In ‘Discussion’, we discuss the results and suggest open problems. Finally, in ‘Methods’ we give proofs for two of the main results in the paper: the lower bound on the LMR protocol, and the protocol for simulating Hamiltonians given by Hermitian polynomials of the input states.
Notation
We use \( {\mathcal H} \) to denote a finitedimensional Hilbert space, and \({\rm{D}}( {\mathcal H} )\) to represent the set of positive semidefinite operators with trace 1 on \( {\mathcal H} \) (i.e. the set of valid quantum states).
The trace distance between \(\rho ,\sigma \in {\rm{D}}( {\mathcal H} )\) is given by \(\frac {1}{2}{\Vert \rho  \sigma \Vert }_{1}\), where \({\Vert A\Vert }_{1}:={\rm{Tr}}(\sqrt{A{A}^{\dagger }})\mathrm{.}\) The trace distance between ρ and σ gives the maximum difference in probability of any measurement on the two states.^{4} For two quantum channels \({ {\mathcal E} }_{1}\) and \({ {\mathcal E} }_{2}\) that act on \({\rm{D}}( {\mathcal H} )\), their trace norm distance is defined as
The diamond norm distance is defined as
where \( {\mathcal I} \) is the identity channel on a kdimensional space \({ {\mathcal H} }_{k}\). Note \(\parallel { {\mathcal E} }_{1}{ {\mathcal E} }_{2}{\parallel }_{\diamond }\ge \parallel { {\mathcal E} }_{1}{ {\mathcal E} }_{2}{\parallel }_{{\rm{tr}}}\).
We use \({\mathbb 1}_{\rm A}\) to mean the identity matrix acting on subsystem A, but if clear from context, we will drop the subscript. We use +〉:=(0〉+1〉)/(2)^{1/2} and denote singlequbit Pauli operators as X, Y, and Z.
Results
LMR protocol vs. state tomography
Lloyd, Mohseni, and Rebentrost^{5} gave a simple method for approximating the transformation in Eq. (2). The number of copies of ρ required by their procedure is not only independent of σ and ρ, but is independent of the dimension and rank of ρ. We state their result in a slightly more general form, where σ has two registers and e ^{−iρt} is applied only to one of them.
Theorem 1
(ref. 5). Let \(\rho \in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})\) and \(\sigma \in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})\) be two unknown quantum states and \(t\in {\mathbb{R}}\) (can be either positive or negative). Then there exists a quantum algorithm that transforms \({\sigma }_{{\rm{AB}}}\otimes {\rho }_{{{\rm{A}}}_{1}}\otimes \cdots \otimes {\rho }_{{{\rm{A}}}_{n}}\) into \({\tilde{\sigma }}_{{\rm{AB}}}\) such that
as long as the number of copies of ρ is n = O(t ^{2}/δ). In other words, this quantum algorithm implements the unitary e ^{−iρt} up to error δ in the diamond norm, using O(t ^{2}/δ) copies of ρ.
We will give a sketch of the proof because many of our more general simulation techniques build on their ideas; for the full proof see Supplementary Information Section A. For simplicity we assume ρ and σ have the same dimension. Using a Taylor series expansion, the target state is
We note that for very small evolution times Δ, we have the following direct calculation:
where by Tr_{ i } we mean taking the partial trace of the ith subsystem, and S is the swap operator between the two registers. If we take Δ = δ/t and repeat this procedure O(t ^{2}/δ) times, we end up implementing the operator e ^{−iρt} up to error O(Δ^{2} · t ^{2}/δ) = O(δ).
Thus the LMR protocol uses O(t ^{2}/δ) copies of ρ to implement the unitary e ^{−iρt} up to error δ in trace norm. (While not noted explicitly in ref. 5, the LMR protocol can be implemented efficiently, i.e. using \(O(\mathrm{log}\,D\cdot {t}^{2}/\delta )\) singlequbit and Fredkin (controlledswap) gates, where \(D=\dim ({ {\mathcal H} }_{{\rm{A}}})\), by applying the linear combination of unitaries algorithm (see, e.g., ref. 3 or ref. 16, Theorem 2.4). For more information, see Supplementary Information Section A). To obtain the result for the diamond norm, simply replace σ by σ _{AB} and perform the partial swap operation e ^{−iSΔ} only between the A registers of σ _{AB} and ρ _{A}, and then discard the last register that was originally holding ρ _{A}.
Additionally, the LMR protocol can be modified to implement the controllede ^{−iρt} operation, which will be important if one wants to implement phase estimation on e ^{−iρt}. A method for implementing controllede ^{−iρt} is stated without proof in ref. 5; we prove this method works and provide an additional approach in the Supplementary Information Section A.1.
An alternative method to LMR for samplebased Hamiltonian simulation would be to perform tomography on the copies of ρ to get an estimate \(\hat{\rho }\) of ρ, and then implement \({e}^{i\hat{\rho }t}\). In Supplementary Information Section B, we show that the number of samples needed if using this strategy is
where d is the dimension of ρ, r is the rank of ρ, and t and δ are as in Theorem 1.
Comparing with Theorem 1, since LMR does not have any dependence on d or r, we immediately see that for large d or r, LMR does significantly better. Furthermore, even fixing d and r, we see that LMR provides a squareroot improvement in sample complexity over tomography in terms of δ.
LMR protocol is optimal
To prove the LMR protocol is in fact asymptotically optimal, we first give a lower bound on the sample complexity of distinguishing two specific states. Next, we assume we have a protocol that simulates e ^{−iρt} to trace norm (which is a weaker assumption than using diamond norm) δ using f(t, δ) samples of ρ for some function f. Then we show that using such a protocol one can distinguish these two states. However, if f = o(t ^{2}/δ), we would violate our lower bound on state discrimination.
Theorem 2
Let f(t, δ) be the number of copies of ρ required to implement the unitary e ^{−iρt} up to error δ in trace norm. Then as long as δ ≤ 1/6 and δ/t ≤ 1/(6π), it holds that f(t,δ) = Θ(t ^{2}/δ).
The proof of Theorem 2 can be found in ‘Discussion’. The proof uses mixed states, so it could be possible that simulating \(\exp (i\psi \rangle \langle \psi t)\) for a pure state ψ〉 could be done more efficiently. This relates to a practically relevant question, namely, the fact that the LMR protocol and certain pure states as resources create a universal model for quantum computation (see ‘Universality of LMR’). However, we can show that LMR is also optimal for pure states in the δ error parameter. We cannot expect to prove a meaningful lower bound on the t dependence in pure state LMR. The reason is that, given any state ρ and promised that \(\exp (i\rho t)\) is periodic with period T (i.e. \(\exp (i\rho {t}_{1})=\exp (i\rho {t}_{2})\) for any t _{2} = t _{1} + kT for integer k and real number T ), we can always simulate the Hamiltonian ρ for an equivalent time t′∈[0,T ) instead. Therefore asymptotic scaling in terms of large t is meaningless. For pure states, we immediately know the period, namely 2π.
To prove that the LMR protocol is optimal for pure states, we employ variants of Grover’s search. While Grover’s search^{17} is a wellknown quantum mechanical task, it is not often stated in its form as a decision problem, and very rarely^{18} as a metrological decision problem, where the inputs are unitaries and the output depends on a property that those unitaries either possess or do not possess. This guise is useful for our purposes, however, because the LMR protocol allows us to turn metrology problems on states into metrology problems on quantum operations.
In the metrological view, Grover’s search, or perhaps more precisely amplitude amplification,^{19} is the following problem of parameter estimation. Let \({\mathcal{T}}\) be a subspace of \({{\mathbb{C}}}^{{2}^{q}}\). We call \({\mathcal{T}}\) the target subspace. Let \({{\mathcal{U}}}_{{\mathcal{T}}}\) be a unitary acting on q + 1 qubits such that
In this problem, and in the following variations, we will assume access to \({{\mathcal{U}}}_{{\mathcal{T}}}\) and \({{\mathcal{U}}}_{{\mathcal{T}}}^{\dagger }\) are free. For a qqubit unitary V, define
Then in Grover’s search, the task is to decide whether λ ≥ w (for w > 0) or λ = 0, while using V and V ^{†} as few times as possible. In other words, if we call s〉: = V0〉^{⊗q} the start state, we would like to determine whether the start state has substantial probability mass in the target subspace or none, promised one is the case. If we solve this problem using Grover’s search and count the number of uses of V and V ^{†} required to succeed with probability 1 − ε, we get the standard complexity \(\Theta (\mathrm{\log}(\mathrm{1/}\varepsilon )/\sqrt{w})\).^{20, 21}
One simple modification of metrological Grover’s search is to replace the circuit description of V with copies of the start state s〉 instead. The problem is now to determine whether \(\lambda :={\left({\mathbb 1}\otimes \left\langle 1\right){U}_{{\mathcal{T}}}\lefts\right\rangle \otimes \left0\right\rangle \right}^{2}\) is at least w > 0 or equal to zero, promised one is the case, given copies of s〉 and unlimited access to \({{\mathcal{U}}}_{{\mathcal{T}}}\) and \({{\mathcal{U}}}_{{\mathcal{T}}}^{\dagger }\). We call this samplebased Grover’s search.
A second variant of metrological Grover’s search is to replace both V and \({{\mathcal{U}}}_{{\mathcal{T}}}\) with quantum states. In this form, the problem becomes: given copies of qqubit states s〉 and t〉, determine whether \(\lambda ={\langle st\rangle }^{2}\) is at least w > 0 or equal to zero, promised one is the case. We call this variant orthogonality testing. We address optimal orthogonality testing in ‘Applications of commutator simulation’.
In Supplementary Information Section D.2, we first prove that for samplebased Grover’s search, we lose the squareroot advantage of the regular Grover’s search:
Lemma 3
Samplebased Grover’s search with success probability 1 − ε uses \(\Theta (\mathrm{\log}(\mathrm{1/}\varepsilon )/w)\) copies of s〉.
Using this result, we prove:
Theorem 4
The number of copies of an unknown pure state ρ required for any algorithm to simulate e ^{−iρt} to trace norm δ is Ω(1/δ).
The main idea of the proof, which can be found in Supplementary Information Section D.2, is that the reflections V and V ^{†} needed for Grover’s algorithm can be implemented by samplebased Hamiltonian simulation using many copies of the state s〉. This gives us a way to reduce samplebased Grover’s search to samplebased Hamiltonian simulation. Then we apply the lower bound of Lemma 3.
Generalized LMR for simulation of Hermitian polynomials
We show samplebased Hamiltonian simulation of Eq. (2) can be further generalized. Instead of evolution of σ by a single state ρ, the target Hamiltonian H could be encoded by some combination of multiple states ρ _{1},ρ _{2},…,ρ _{ K }. For example, we might want to implement the map
where H = f(ρ _{1},ρ _{2},…,ρ _{ K }) is some Hermitian polynomial function of the input states. In fact, we prove it is possible to simulate arbitrary Hermitian multinomial functions:
Theorem 5
Let \({\rho }_{1},\ldots ,{\rho }_{K}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})\) and \({\sigma }_{{\rm{AB}}}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})\) be unknown quantum states, and let
be a Hermitian polynomial in ρ _{1},…,ρ _{ K } , where R is a finite set of strings over the alphabet {1,2,…,K}. Using n samples from the states {ρ _{1},…,ρ _{ K }}, a quantum algorithm can transform σ _{AB} into \({\tilde{\sigma }}_{{\rm{AB}}}\) such that
if n = O(Lc ^{2} t ^{2}/δ) where c: = ∑_{ r∈R }c _{ r } and \(L:={\max }_{r\in R}r\) is the multinomial degree of H. Moreover, on average, the number of copies of ρ _{ j } consumed is n _{ j } = O(κ _{ j } c ^{2} t ^{2}/δ) where κ _{ j } = ∑_{ r∈R } v _{ j }(r)c _{ r }/c, and v _{ j }(r) = {s:r _{ s } = j}.
As corollaries of Theorem 5, we have the following simulation results for linear combinations of states (Corollary 6) and for the commutator and anticommutator of two states (Corollary 7):
Corollary 6
Let \({\rho }_{1},\ldots ,{\rho }_{K}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})\) and \({\sigma }_{{\rm{AB}}}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})\) be unknown quantum states, and let \({c}_{1},\ldots ,{c}_{K}\in {\mathbb{R}}\) . Using n samples from the states {ρ _{1},…,ρ _{ K }}, a quantum algorithm can transform σ _{AB} into \({\tilde{\sigma }}_{{\rm{AB}}}\) such that
if n = O(c ^{2} t ^{2}/δ) where \(c:={\sum }_{j\mathrm{=1}}^{K}{c}_{j}\) . Moreover, on average, the number of copies of ρ _{ j } consumed is n _{ j } = O(c _{ j }ct ^{2}/δ).
Corollary 7
Let \({\rho }_{1},{\rho }_{2}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})\) and \({\sigma }_{{\rm{AB}}}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})\) be unknown quantum states, and ϕ∈[0, 2π). Using n samples each of ρ _{1} and ρ _{2} , a quantum algorithm can transform σ _{AB} into \({\tilde{\sigma }}_{{\rm{AB}}}\) such that
if \(n=O({t}^{2}/\delta )\)
Note from Eq. (16) that
so by choosing ϕ = 0, we recover the anticommutator Hamiltonian {ρ _{1},ρ _{2}}/2, and choosing ϕ = π/2 we recover the commutator Hamiltonian i[ρ _{1},ρ _{2}]/2.
Furthermore, Corollary 6 and Corollary 7 are both optimal:
Theorem 8
Let {c _{1},…,c _{ K }} be a set of K real numbers. Then there exist ρ _{1},…,ρ _{ K } such that to simulate \(H={\sum }_{j\mathrm{=1}}^{K}{c}_{j}{\rho }_{j}\) for time t and to error δ in trace norm requires Ω(c ^{2} t ^{2}/δ) copies of states in {ρ _{1},…,ρ _{ K } }, where c: = ∑_{ j }c _{ j }, as long as δ and δ/(ct) are smaller than some constants.
Theorem 9
To simulate H = i[ρ _{1}, ρ _{2}] for time t and to trace norm error δ requires Ω(t ^{2}/δ) copies each of the states ρ _{1} and ρ _{2} , as long as δ and δ/t are smaller than some constants.
While Theorem 9 only applies to commutators, it is easy to see that the simulation from Corollary 7 of the anticommutator {ρ _{1}, ρ _{2}} has optimal scaling in t and δ, because in the qubit case, we can always choose \(\rho_2 = {\mathbb 1}/2\) so that {ρ _{1}, ρ _{2}} = ρ _{1} and we can apply the lower bound from Theorem 2. The proofs for results in this section can be found in Supplementary Information Section D.3.
Applications of commutator simulation
We now describe how one can use commutator simulation to perform tasks such as orthogonality testing and coherent addition of two pure states.
We first give a protocol for coherent state addition: given many copies of unknown pure states ψ _{1}〉 and ψ _{2}〉, the task is to obtain a state of the form
for some \(a,b\in {\mathbb{R}}\). Note that the target state is sensitive to the global phases of the two input states—in particular, the relative phase between ψ _{1}〉 and ψ _{2}〉—which have no physical meaning. To make the task welldefined, we instead demand the target state to be of the form
for some \(a,b\in {\mathbb{R}}\), which is unique (up to a global phase) even when the global phases of the two input states have not been specified. Note that we can always recover Eq. (18) from Eq. (19) by fixing the global phases of the two input states appropriately (i.e. such that 〈ψ _{2}ψ _{1}〉 > 0).
Theorem 10
Let ψ _{1}〉 and ψ _{2}〉 be unknown pure states of the same dimension. Promised that the angle between the two states is \(\Delta :=\arccos \langle {\psi }_{1}{\psi }_{2}\rangle \) and Δ∉{0,π/2}, it is possible to create the state
to trace distance δ using \(O(\frac{{\chi }^{2}}{\delta {\sin }^{2}2\Delta })\) copies of ψ _{1}〉 and ψ _{2}〉, where e ^{iϕ}: = 〈ψ _{2}ψ _{1}〉/〈ψ _{2}ψ _{1}〉 is an unimportant phase factor that can be ignored by appropriately adjusting the global phases of the two states.
A similar protocol has been independently discovered in ref. 22, Theorem 2, and recently implemented experimentally.^{23} While our protocol involves only the two input states ψ _{1}〉 and ψ _{2}〉, the protocol of ref. 22 requires an additional reference state χ〉. Another difference between the two results is that we consume several copies of the input states to obtain an approximation of the target state, while^{23} consume only a single copy but rely on a probabilistic postselection.
The proof of Theorem 10 (which can be found in Supplementary Information Section F) is based on commutator simulation and effectively implements a rotation in the twodimensional subspace spanned by ψ _{1}〉 and ψ _{2}〉. Indeed, note from Eq. (20) that ψ(0)〉 = ψ _{1}〉 and ψ(Δ)〉 = e ^{iϕ}ψ _{2}〉, while intermediate values of χ produce states that interpolate between these two. (If one does not care about the relative phase \({e}^{i\varphi }\), one can always exchange the two states and replace \(\chi \) by \(\Delta \chi \), which would improve the complexity by a constant factor when \(\chi >\Delta \mathrm{/2}\)). As a consequence, the target state in Eq. (19) has real coefficients a and b. One can also achieve complex coefficients using a more sophisticated Hamiltonian that includes terms proportional to ψ _{1}〉〈ψ _{1} and ψ _{2}〉〈ψ _{2}, but we do not consider this case here for the sake of simplicity.
Our protocol requires a very large number of samples when the states ψ _{1}〉 and ψ _{2}〉 have either very small or very large overlap (i.e. in cases when \({\sin }^{2}2\Delta \) is very small). This is because we use commutator simulation to effectively implement a rotation in the twodimensional subspace spanned by ψ _{1}〉 and ψ _{2}〉, and in the special cases when ψ _{1}〉⊥ψ _{2}〉 or ψ _{1}〉 = e ^{iϕ}ψ _{2}〉 the commutator vanishes and hence our protocol fails (in the second case the task is trivial though).
Interestingly, by choosing χ = Δ/2 in Eq. (20) it is possible to coherently add two states, i.e. create a state proportional to ψ _{1}〉 + ψ _{2}〉 (we are ignoring the relative phase between the two states). However, to determine Δ one needs to estimate the inner product between the two states, which can be done by running phase estimation on the commutator.
We note that the commutator of orthogonal states is zero, while the commutator of nonorthogonal states is nonzero (as long as the states are not identical). Using this fact, and by performing phase estimation on the unitary generated by the commutator of two pure states, we can create a test for orthogonality (for proof, see Supplementary Information Section F).
Theorem 11
Let ψ _{1}〉 and ψ _{2}〉 be unknown pure states of the same dimension. Promised that either 〈ψ _{1}ψ _{2}〉 = 0 or 〈ψ _{1}ψ _{2}〉 ≥ w, deciding which with probability \(1\epsilon \) uses \(\Theta (\mathrm{\log}(\mathrm{1/}\epsilon )/w)\) copies of ψ _{1}〉 and ψ _{2}〉.
Universality of LMR
In many solidstate implementations of quantum computers, such as quantum dots,^{24} donor pairs,^{25} and electron spins,^{26} the Heisenberg exchange is the natural coupling interaction between qubits. More specifically, the Heisenberg interaction between qubits i and j is given by
where X ^{i}, Y ^{i}, and Z ^{i} are the Pauli matrices acting on qubit i. Up to an overall scaling, this is the same as the swap interaction S used in the LMR protocol, see Eq. (7). In the solid state systems mentioned above, the Heisenberg interaction typically can be turned on and off for pairs of qubits for any desired length of time, and the operations induced by these interactions are usually fast and reliable.
While it is beneficial to create computing models that take advantage of the Heisenberg exchange interaction, this interaction is not universal for spin1/2 systems.^{27} Several schemes have overcome this limitation by using encoded logical qubits and decoherencefree subsystems.^{28, 29}
In this section, we use the LMR protocol to design a universal model for quantum computation that does not use encoded qubits, but which requires only the Heisenberg interaction, as well as the ability to prepare the states 0〉 and +〉 on a single qubit. Our scheme thus requires n + 1 physical qubits to perform computations on n qubits, in contrast to encoded schemes, of which the simplest require 2 or 3 times the number of physical qubits.^{28, 29} Furthermore, there has been much research in the field of quantum dots on how to quickly and reliably prepare a fixed qubit state, e.g., in refs 30,31,32,33. These schemes could be applied to produce the singlequbit states 0〉 and +〉 needed for our protocol.
We consider a connectivity graph of the qubits as in Fig. 1 (different connectivity graphs lead to different scalings depending on which costs you would like to optimize). We assume exchange interactions can be applied between connected qubits in the form of unitaries \(\exp (it{H}_{ij})\) for arbitrary t. The qubit q _{*} is where the states 0〉 and +〉 are prepared.
Recall that arbitrary singlequbit gates combined with any entangling twoqubit gate is sufficient for universal quantum computation.^{34} Since we do not have encoded qubits, the exchange interaction itself immediately gives us an entangling gate. Now for universal quantum computation we need to show how to perform arbitrary singlequbit gates.
Let \({X}_{\phi }:=\exp [i\theta X]\) and let \({Z}_{\theta }:=\exp [i\theta Z]\) for Pauli’s X and Z. Then any singlequbit rotation can be written as X _{ ϕ } Z _{ θ } X _{ ξ } for some angles ϕ, θ, and ξ.^{4} Therefore, it is sufficient to show how to perform X and Z rotations.
If qubit i needs to have a singlequbit gate performed on it, using the Heisenberg interaction, we use swap gates to move that qubit to position 0 of Fig. 1. We now show how to perform Z _{ ϕ } and X _{ θ } on the qubit in position 0. Using LMR, given n copies of the state 0〉 input at qubit q _{*}, using only partial swap operations on qubits q _{0} and q _{*}, (i.e. applying the Heisenberg interaction between qubits q _{0} and q _{*}) we can apply the unitary
(up to a global phase) to accuracy O(n ^{−1}). Likewise, using the LMR protocol, given n copies of the state +〉, using only partial swap interactions between qubits q _{0} and q _{*}, we can apply the unitary
(up to a global phase) to accuracy O(n ^{−1}).
To apply an arbitrary singlequbit rotation to accuracy ε, we need O(ε ^{−1}) resource states 0〉 and +〉 (this construction is reminiscent of ideas in ref. 35). Suppose that over the course of an algorithm, one must apply M singlequbit gates and M′ CNOT gates. A CNOT gate requires a constant number of singlequbit gates as well as a constant number of partial swap gates.^{34} Then to bound the error over the course of the algorithm, we require accuracy of O((M + M′)^{−1}) for each singlequbit gate. Therefore, we require O((M + M′)^{2}) resource states 0〉 and +〉 in total. Additionally, using the connectivity graph of Fig. 1, to move qubits into proximity with one another to perform any singlequbit or twoqubit gate requires O(N) swap operations operations, where N is the number of qubits. Thus the total number of operations scales as O(N(M + M′)^{2}).
The states 0〉 and +〉 need not be prepared perfectly for our protocol to work. For example, given depolarized versions of these states, we would need to increase the number of rounds in the LMR protocol by a constant factor. In fact, two arbitrary states (other than 0〉 and +〉) could be used, as long as they are well characterized and not diagonal in the same basis.
Our model produces a polynomial (in particular squared) blowup in the number of operations, which still allows for universal quantum computation. However, it would be impossible to obtain a speedup for problems such as Grover’s search. We hope it is a useful model for systems where the Heisenberg exchange is a natural operation. It may even be useful in nonsolid state systems such as cold, trapped atoms, where it was shown that partial swaps could be implemented using Rydberg interactions or through coupling to a cavity.^{36}
Discussion
We have shown that the LMR protocol is optimal for the problem of simulating unknown Hamiltonians encoded as quantum states. Moreover, the protocol and its generalizations also turn out to be optimal for a variety of other tasks, such as discriminating between pure states and Hamiltonian evolution under the commutators of unknown states. We hope that this study will motivate the discovery of other possible applications of this versatile protocol.
We have not shown the optimality of our protocol for simulating the evolution by the multinomials in Eq. (12). It would be interesting to investigate whether it is optimal, or whether better algorithms can be found.
Another interesting aspect is the role of ancilla qubits in our protocol. While the original LMR protocol for Hamiltonian simulation is based on partial swaps and hence does not require ancilla qubits, the use of ancillas seems to be essential in our more general simulation protocol (see Fig. 2 in ‘Methods’). We wonder whether the use of ancillas is necessary in our protocol or, for example, whether it can instead be implemented using the continuous permutations introduced in ref. 12. These continuous permutations generalize the partial swap operation and do not require ancillas.
Another possible direction is to investigate distributed versions of our protocols in the context of multiparty communication. Reference 37 considers a protocol for simulating distributed unitaries over multiple remote parties using shared entanglement and a limited amount of quantum communication, and the techniques they use are reminiscent of those of the LMR protocol. It would be interesting to investigate the connections of ref. 37 with the protocols in our work.
Finally, the LMR protocol can be seen as allowing the encoding of the operation e ^{−iρt} into multiple copies of a quantum state ρ. As discussed in ‘LMR protocol vs. state tomography’, having access to O(t ^{2}/δ) copies of ρ allows a user to perform the operation e ^{−iρt}, but may be insufficient for the user to determine what ρ is through tomography. It is an intriguing question whether other quantum operations could be encoded into states in this way, so that a user could perform the quantum operation but learn little else about what operation is being performed. This could be seen as a form of quantum copyprotection.^{38} See ref. 39 for some progress in this direction, and ref. 40 for negative results when the encoding is required to be a circuit and not a state.
Methods
In this section, we give proofs for two of the main results in the paper: Theorem 2 (optimality of the LMR protocol), and Theorem 5 (the protocol for simulating arbitrary Hermitian polynomials of the input states). Many of the other proofs in this paper are similar, and can be found in the Supplementary Information.
Proof of Theorem 2
The upper bound holds by the LMR protocol, Theorem 1, so we will only prove the lower bound. The fact that the trace norm lower bounds the diamond norm makes a tight lower bound in terms of the trace norm a stronger result than if we had used the diamond norm. Let
Then, given many copies of an unknown state ρ, suppose we want to distinguish between the cases ρ _{1}: = ρ(1/2) and \({\rho }_{2}:=\rho (\frac{1}{2}+\epsilon )\), with \(0\, < \, \epsilon \le 1/2\), promised ρ is one of the two. One way of doing this is to consider the singlequbit unitary operator \({\mathcal{U}}(\rho ,t):=\exp (i\rho t)\). Then for t _{ ε }: = π/(2ε) the operators \({\mathcal{U}}({\rho }_{i},{t}_{\epsilon })\) become orthogonal, namely,
where ∝ indicates that we have hidden an unimportant phase factor. Consequently, applying \({\mathcal{U}}(\rho ,t)\) to +〉 and measuring in the Xbasis will distinguish ρ _{1} from ρ _{2} with certainty.
Thus, we can distinguish between ρ = ρ _{1} or ρ = ρ _{2} with probability at least 2/3 using no more than f(t _{ ε },1/3) copies of ρ by implementing a map that differs from \({\mathcal{U}}(\rho ,{t}_{\epsilon })\) by trace norm 1/3. However, Lemma D1 in the Supplementary Information tells us that C _{ η }/ε ^{2} samples of ρ are required if ε < η ≤ 1/2. Therefore
using the definition of t _{ ε }, and where C := 4C _{ η }/π ^{2} is some positive constant. Eq. (26) holds whenever t _{ ε } ≥ π since ε ≤ 1/2 and so \({t}_{\varepsilon }=\tfrac{\pi }{2}\cdot \tfrac{1}{\varepsilon }\ge \pi \).
Now suppose instead we have arbitrary δ and t satisfying δ ≤ 1/6 and t/δ ≥ 6π, as assumed in the theorem statement. We note the following inequality for any \(t\in {\mathbb{R}}\) and any integer m ≥ 0:
which holds because one way of simulating \(\exp (i\rho mt)\) up to error mδ is to run m times a simulation of \(\exp (i\rho t)\) up to error δ. Taking \(m=\lceil 1/(6\delta )\rceil \), we have
where Eq. (29) holds because mδ ≤ 1/6 + δ ≤ 1/3 and mt ≥ t/(6δ) ≥ π, so Eq. (26) applies.
We now give a proof of Theorem 5. One key tool in the proof will be the following lemma, which lets us simulate a Hamiltonian given by the difference of two subnormalized states:
Lemma 12
Let \(\rho ^{\prime} \in {\rm{D}}({{\mathbb{C}}}^{2}\otimes { {\mathcal H} }_{{\rm{A}}})\) be a quantum state of the form ρ′ = 0〉〈0⊗ρ _{+} + 1〉〈1⊗ρ _{−}, where ρ _{+},ρ _{−} are unknown subnormalized states with Tr ρ _{+} + Tr ρ _{−} = 1. Using n samples of ρ′, a quantum algorithm can transform σ _{AB} into \({\tilde{\sigma }}_{{\rm{AB}}}\) such that
if n = O(t ^{2}/δ).
The idea is to use the first qubit of ρ′ as a control that determines whether one applies a positive or negative time evolution of partial swap between the second register of ρ′ and the target state. The rest of the proof (found in Supplementary Information Section E.2) proceeds as in the proof sketch of the standard LMR protocol.
Proof of Theorem 5
We first consider a term H _{ r } with r = (1,2,…,k), for some k such that 2 ≤ k ≤ K. (More general r will follow easily from this special case.) Let S _{ k } be the cyclic permutation of k copies of \({ {\mathcal H} }_{{\rm{A}}}\) that acts as follows: S _{ k }j _{1},j _{2},…,j _{ k }〉 = j _{ k },j _{1},…,j _{ k−1}〉. In other words,
Consider the circuit in Fig. 2. The output is of the form \({{\rho \prime}^{(r)}}=\left0\right\rangle \left\langle 0\right\otimes {\rho }_{+}^{(r)}+\left1\right\rangle \left\langle 1\right\otimes {\rho }_{}^{(r)}\), where
When we chose \(a{b}^{\ast }={e}^{i{\phi }_{r}}\mathrm{/2}\), we find
To deal with arbitrary r with r = k, simply supply the appropriate input states ρ _{ j } in Fig. 2.
Now without loss of generality, we can assume c _{ r } ≥ 0 for all r, since the sign can be absorbed into the phase ϕ _{ r }. Therefore by sampling from r∈R with probability c _{ r }/c and creating ρ′^{(r)}, we obtain the state
By Lemma 12, we can therefore simulate the Hamiltonian
for the desired time and precision using O(c ^{2} t ^{2}/δ) copies of ρ′. Since each copy of ρ′ requires a sample of a state ρ′^{(r)}, and each of these states requires at most \(L={\max }_{r\in R}r\) copies of states in {ρ _{1},…,ρ _{ K }}, we obtain the stated total sample complexity.
To calculate the average number of uses of ρ _{ j }, we note that ρ _{ j } is used v _{ j }(r) times to create the state ρ′^{(r)}, and to create the state ρ′, the state ρ′^{(r)} is chosen with probability c _{ j }/c. Thus ρ _{ j } is used on average κ _{ j } = ∑_{ r∈R } v _{ j }(r)c _{ r }/c times to create a single ρ′. Then since O(c ^{2} t ^{2}/δ) copies of ρ′ are used in the simulation, we obtain the stated complexity.
References
 1.
Lloyd, S. Universal quantum simulators. Science 273, 1073–1078 (1996).
 2.
Low, G.H. & Chuang, I.L. Optimal Hamiltonian simulation by quantum signal processing. Physical Review Letters 108, 010501 (2017).
 3.
Berry, D. W., Childs, A. M. & Kothari, R. Hamiltonian simulation with nearly optimal dependence on all parameters. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, 792–809 (IEEE, 2015).
 4.
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2010). https://books.google.com/books?id=s4DEy7oa0C.
 5.
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631–633 (2014).
 6.
Audenaert, K., Datta, N. & Ozols, M. Entropy power inequalities for qudits. J. Math. Phys. 57, 052202 (2016).
 7.
Preskill, J. Plugin quantum software. Nature 402, 357–358 (1999).
 8.
Gottesman, D. & Chuang, I. L. Demonstrating the viability of universal quantum computation using teleportation and singlequbit operations. Nature 402, 390–393 (1999).
 9.
Wang, G. Quantum algorithms for curve fitting. arXiv preprint arXiv:1402.0660 (2014).
 10.
Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014).
 11.
Cong, I. & Duan, L. Quantum discriminant analysis for dimensionality reduction and classification. New j. Phys. 18, 073011 (2016).
 12.
Ozols, M. How to combine three quantum states. arXiv preprint arXiv:1508.0086. (2015)
 13.
Haah, J., Harrow, A. W., Ji, Z., Wu, X. & Yu, N. Sampleoptimal tomography of quantum states. Proceedings of the Fortyeighth Annual ACM Symposium on Theory of Computing, 913–925 (ACM, 2016).
 14.
O’Donnell, R. & Wright, J. Efficient quantum tomography. Proceedings of the Fortyeighth Annual ACM Symposium on Theory of Computing, 899–912 (ACM, 2016).
 15.
Emch, G. G. Mathematical and Conceptual Foundations of 20thCentury Physics. NorthHolland Mathematics Studies (Elsevier Science, 1984). https://books.google.com/books?id=eYQHIjkaEroCpg=PA306.
 16.
Kothari, R. Efficient algorithms in quantum query complexity. Ph.D. thesis, University of Waterloo (2014). http://hdl.handle.net/10012/8625.
 17.
Grover, L. K. A fast quantum mechanical algorithm for database search. In Proceedings of the TwentyEighth Annual ACM Symposium on Theory of Computing, 212–219 (ACM, 1996).
 18.
DemkowiczDobrzański, R. & Markiewicz, M. Quantum computation speedup limits from quantum metrological precision bounds. Phys. Rev. A 91, 062322 (2015).
 19.
Brassard, G., Høyer, P., Mosca, M. & Tapp, A. Quantum amplitude amplification and estimation. Contemp. Math. 305, 53–74 (2002).
 20.
Boyer, M., Brassard, G., Høyer, P. & Tapp, A. Tight bounds on quantum searching. Fortschritte der Physik 46, 493–505 (1998).
 21.
Buhrman, H., Cleve, R., de Wolf, R. & Zalka, C. Bounds for smallerror and zeroerror quantum algorithms. In Foundations of Computer Science, 1999. 40th Annual Symposium on, 358–368 (IEEE, 1999).
 22.
Oszmaniec, M., Grudka, A., Horodecki, M. & Wójcik, A. Creating a superposition of unknown quantum states. Phys. Rev. Lett. 116, 110403 (2016).
 23.
Li, K. et al. Experimentally superposing two pure states with partial prior knowledge. arXiv preprint arXiv:1608.04349 (2016)
 24.
Loss, D. & DiVincenzo, D. P. Quantum computation with quantum dots. Phys. Rev. A 57, 120–126 (1998).
 25.
Kane, B. E. A siliconbased nuclear spin quantum computer. Nature 393, 133–137 (1998).
 26.
Vrijen, R. et al. Electronspinresonance transistors for quantum computing in silicongermanium heterostructures. Phys. Rev. A 62, 012306 (2000).
 27.
Barenco, A. et al. Elementary gates for quantum computation. Phys. Rev. A. 52, 3457–3467 (1995).
 28.
DiVincenzo, D. P., Bacon, D., Kempe, J., Burkard, G. & Whaley, K. B. Universal quantum computation with the exchange interaction. Nature 408, 339–342 (2000).
 29.
Levy, J. Universal quantum computation with spin1/2 pairs and Heisenberg exchange. Phys. Rev. Lett. 89, 147902 (2002).
 30.
Costache, M. V. & Valenzuela, S. O. Experimental spin ratchet. Science 330, 1645–1648 (2010).
 31.
Folk, J. A., Potok, R. M., Marcus, C. M. & Umansky, V. A gatecontrolled bidirectional spin filter using quantum coherence. Science 299, 679–682 (2003).
 32.
Hanson, R. et al. Semiconductor fewelectron quantum dot operated as a bipolar spin filter. Phys. Rev. B 70, 241304 (2004).
 33.
Recher, P., Sukhorukov, E. V. & Loss, D. Quantum dot as spin filter and spin memory. Phys. Rev. Lett. 85, 1962–1965 (2000).
 34.
Bremner, M. J. et al. Practical scheme for quantum computation with any twoqubit entangling gate. Phys. Rev. Lett. 89, 247902 (2002).
 35.
Marvian, I. & Mann, R. B. Building all time evolutions with rotationally invariant Hamiltonians. Phys. Rev. A 78, 022304 (2008).
 36.
Pichler, H., Zhu, G., Seif, A., Zoller, P. & Hafezi, M. A measurement protocol for the entanglement spectrum of cold atoms. Preprint at arXiv:1605.08624 (2016).
 37.
Harrow, A. W. & Leung, D. W. A communicationefficient nonlocal measurement with application to communication complexity and bipartite gate capacities. IEEE Trans. Inform. Theory 57, 5504–5508 (2011).
 38.
Aaronson, S. Quantum copyprotection and quantum money. In Computational Complexity (CCC), 2009 IEEE 24th Annual Conference on, 229–242 (IEEE, 2009).
 39.
Marvian, I. & Lloyd, S. Universal quantum emulator. arXiv preprint arXiv:1606.02734 (2016).
 40.
Alagic, G. & Fefferman, B. On quantum obfuscation. arXiv preprint arXiv:1602.01771 (2016).
 41.
Ekert, A. K. et al. Direct estimations of linear and nonlinear functionals of a quantum state. Phys. Rev. Lett. 88, 217901 (2002).
Acknowledgements
We thank Andrew Childs for suggesting the proof idea of Theorem 4, Aram Harrow, Stephen Jordan, Seth Lloyd, Iman Marvian, Ronald de Wolf, Michael Gullans, and Henry Yuen for useful discussions, and Michał Oszmaniec for pointing out the ref. 22. Part of this work was done while M.O. was visiting the University of Maryland and MIT, so he thanks both institutions for their hospitality. S.K. and C.Y.L. are funded by the Department of Defense. G.H.L. is funded by the NSF CCR and the ARO quantum computing projects. M.O. acknowledges Leverhulme Trust Early Career Fellowship (ECF2015256) and European Union project QALGO (Grant Agreement No. 600700) for financial support. T.J.Y. thanks the DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. The authors are grateful to the University of Maryland Libraries’ Open Access Publishing Fund and the Massachusetts Institute of Technology Open Access Publishing Fund for partial funding for open access.
Author information
Affiliations
Contributions
All authors contributed equally.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interest.
Electronic supplementary material
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Kimmel, S., Lin, CY., Low, G. et al. Hamiltonian simulation with optimal sample complexity. npj Quantum Inf 3, 13 (2017). https://doi.org/10.1038/s4153401700137
Received:
Revised:
Accepted:
Published:
Further reading

Generation of Wernerlike states via a twoqubit system plunged in a thermal reservoir and their application in solving binary classification problems
Scientific Reports (2021)

Compiling basic linear algebra subroutines for quantum computers
Quantum Machine Intelligence (2021)

Batched quantum state exponentiation and quantum Hebbian learning
Quantum Machine Intelligence (2019)

Bayesian deep learning on a quantum computer
Quantum Machine Intelligence (2019)