Part: control Week 13 Published lqr.py test_lqr.py

Linear-Quadratic Regulation: The Exact Dynamic Program

The linear-quadratic regulator as exact dynamic programming: a quadratic value function, the Riccati recursion as Chapter 1's Bellman optimality equation in coordinates, the linear-feedback optimal policy, the infinite-horizon algebraic Riccati equation, and the LQG separation principle — with Doyle's warning that optimal output feedback carries no guaranteed stability margins.

On this page
  1. The LQR problem
  2. LQR is exact dynamic programming
  3. Infinite horizon: the algebraic Riccati equation
  4. The bridge to Chapter 1
  5. LQG: optimal output feedback, and its fragility
  6. What’s next
  7. Exercises
  8. Companion code

Linear-Quadratic Regulation: The Exact Dynamic Program

Where we are. Weeks 11–12 built the linear model and the structural properties — stability, controllability, observability — that say what control is possible. Now we put a cost on the model and ask for the best control. The linear-quadratic regulator (LQR) is the one optimal-control problem dynamic programming solves in closed form, and it is exactly Chapter 1’s Bellman optimality equation specialized to linear dynamics and quadratic cost. The “DP bridge” promised since Chapter 1 is paid here: the value function is a quadratic, value iteration becomes the Riccati recursion, and the optimal policy is linear state feedback u=Khu = -\lqrgain\statevec. Bellman, Riccati, and feedback control turn out to be one calculation.

The LQR problem

Definition 13.1 (Linear-quadratic regulator).

For the discrete-time linear system hk+1=Ahk+Buk\statevec_{k+1} = \statemat\statevec_k + \inputmat u_k, the finite-horizon LQR problem is to choose u0,,uN1u_0,\dots,u_{\horizon-1} minimizing the quadratic cost

J=k=0N1(hkQhk+ukRuk)+hNQNhN,\costtogo = \sum_{k=0}^{\horizon-1}\big(\statevec_k^\top Q\,\statevec_k + u_k^\top R\,u_k\big) + \statevec_\horizon^\top Q_\horizon\,\statevec_\horizon,

with state-cost Q0Q \succeq 0, control-cost R0R \succ 0, and terminal cost QN0Q_\horizon \succeq 0. The infinite-horizon problem takes N\horizon\to\infty with no terminal term, minimizing k=0(hkQhk+ukRuk)\sum_{k=0}^{\infty}(\statevec_k^\top Q\,\statevec_k + u_k^\top R\,u_k).

R0R \succ 0 makes every control expensive, so the minimizer is unique and finite; Q0Q \succeq 0 penalizes leaving the origin. This is a Markov decision process (Chapter 1) with a known deterministic linear kernel and a quadratic cost in place of a reward.

LQR is exact dynamic programming

Theorem 13.1 (LQR via dynamic programming).

The optimal cost-to-go of Definition 13.1 is the quadratic vk(h)=hPkh\valuefn_k^*(\statevec) = \statevec^\top \riccati_k\statevec, where Pk\riccati_k runs the backward Riccati recursion

PN=QN,Pk=Q+APk+1AAPk+1B(R+BPk+1B)1BPk+1A,\riccati_\horizon = Q_\horizon, \qquad \riccati_k = Q + \statemat^\top \riccati_{k+1}\statemat - \statemat^\top \riccati_{k+1}\inputmat\,(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat,

and the optimal policy is the linear state feedback uk=Kkhku_k = -\lqrgain_k\statevec_k with time-varying gain

Kk=(R+BPk+1B)1BPk+1A.\lqrgain_k = (R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat.
Proof.

Backward induction on the Bellman optimality equation vk(h)=minu[hQh+uRu+vk+1(Ah+Bu)]\valuefn_k^*(\statevec) = \min_u\big[\statevec^\top Q\statevec + u^\top R u + \valuefn_{k+1}^*(\statemat\statevec + \inputmat u)\big].

Base. vN(h)=hQNh\valuefn_\horizon^*(\statevec) = \statevec^\top Q_\horizon\statevec, so PN=QN\riccati_\horizon = Q_\horizon.

Step. Assume vk+1(h)=hPk+1h\valuefn_{k+1}^*(\statevec) = \statevec^\top \riccati_{k+1}\statevec. Substituting and expanding the quadratic in uu,

hQh+uRu+(Ah+Bu)Pk+1(Ah+Bu)=u ⁣(R+BPk+1B)0u+2uBPk+1Ah+h(Q+APk+1A)h.\begin{aligned} \statevec^\top Q\statevec + u^\top R u + (\statemat\statevec + \inputmat u)^\top \riccati_{k+1}(\statemat\statevec + \inputmat u) &= u^\top\!\underbrace{(R + \inputmat^\top \riccati_{k+1}\inputmat)}_{\textstyle \succ 0}\,u + 2\,u^\top \inputmat^\top \riccati_{k+1}\statemat\,\statevec \\ &\quad + \statevec^\top(Q + \statemat^\top \riccati_{k+1}\statemat)\statevec . \end{aligned}

The Hessian in uu is R+BPk+1B0R + \inputmat^\top \riccati_{k+1}\inputmat \succ 0 (since R0R\succ0, Pk+10\riccati_{k+1}\succeq0), so the minimizer is the stationary point

u=(R+BPk+1B)1BPk+1Ah=Kkh.u^* = -(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat\,\statevec = -\lqrgain_k\statevec .

Substituting uu^* back (completing the square) leaves a pure quadratic in h\statevec, vk(h)=hPkh\valuefn_k^*(\statevec) = \statevec^\top \riccati_k\statevec, whose matrix is exactly the Riccati recursion above. The quadratic form is preserved, closing the induction. \qquad\blacksquare

The proof is Chapter 1’s value iteration run on a quadratic ansatz.

Every step is one application of the Bellman optimality operator; the minimization is exact (a quadratic in uu), not sampled or approximated, because the model is known and the cost is quadratic. The optimal value, the optimal policy, and the optimal cost v0(h0)=h0P0h0\valuefn_0^*(\statevec_0) = \statevec_0^\top \riccati_0\statevec_0 all fall out of one backward sweep.

Infinite horizon: the algebraic Riccati equation

Run the recursion backward from a far horizon and Pk\riccati_k settles to a constant.

Theorem 13.2 (Infinite-horizon LQR).

If (A,B)(\statemat,\inputmat) is stabilizable and (A,Q1/2)(\statemat, Q^{1/2}) is detectable, then as N\horizon\to\infty the Riccati iterates converge, PkP\riccati_k\to\riccati, to the unique symmetric positive-semidefinite solution of the discrete algebraic Riccati equation (DARE)

P=Q+APAAPB(R+BPB)1BPA.\riccati = Q + \statemat^\top \riccati\,\statemat - \statemat^\top \riccati\inputmat\,(R + \inputmat^\top \riccati\inputmat)^{-1}\inputmat^\top \riccati\,\statemat .

The optimal policy is the stationary feedback u=Khu = -\lqrgain\statevec with K=(R+BPB)1BPA\lqrgain = (R + \inputmat^\top \riccati\inputmat)^{-1}\inputmat^\top \riccati\statemat, and the closed loop ABK\statemat - \inputmat\lqrgain is asymptotically stable (Schur).

The hypotheses are exactly Week 12’s structural properties: stabilizability (every uncontrollable mode already stable) guarantees a finite-cost policy exists, and detectability (every unobservable-through-QQ mode already stable) guarantees the stabilizing solution is unique.

The standard proof shows the Riccati map is a monotone contraction on the positive-semidefinite cone; see Lewis et al. (2012) and Bertsekas (2017) .

Continuous time. The same logic in continuous time replaces the recursion by the Hamilton–Jacobi–Bellman equation minuH=0\min_u \hamiltonian = 0, whose Hamiltonian H=hQh+uRu+(hv)(Ah+Bu)\hamiltonian = \statevec^\top Q\statevec + u^\top R u + (\nabla_{\statevec}\valuefn^*)^\top(\statemat\statevec + \inputmat u) adds the stage cost to the cost-to-go’s rate of change along the dynamics. Its quadratic solution gives the continuous algebraic Riccati equation AP+PAPBR1BP+Q=0\statemat^\top \riccati + \riccati\statemat - \riccati\inputmat R^{-1}\inputmat^\top \riccati + Q = 0, with gain K=R1BP\lqrgain = R^{-1}\inputmat^\top \riccati and Hurwitz closed loop ABK\statemat - \inputmat\lqrgain. The HJB equation is the continuous-time Bellman equation; Chapter 1’s HJB aside lands exactly here. Note the kinship with Week 12: drop the control term PBR1BP\riccati\inputmat R^{-1}\inputmat^\top \riccati and the ARE is the Lyapunov equation — optimal control is stability analysis with a cost-shaping term.

The bridge to Chapter 1

This is the chapter the curriculum has been pointing at. Lay the two equations side by side:

  • Chapter 1 (general MDP). v(s)=maxa[r(s,a)+γsp(ss,a)v(s)]\optvaluefn(s) = \max_a\big[\reward(s,a) + \discount\sum_{s'}\transition(s'\mid s,a)\,\optvaluefn(s')\big], solved by value iteration, which converges because T\bellmanopt is a γ\discount-contraction.
  • Chapter 13 (linear-quadratic). v(h)=minu[hQh+uRu+v(Ah+Bu)]\valuefn^*(\statevec) = \min_u\big[\statevec^\top Q\statevec + u^\top R u + \valuefn^*(\statemat\statevec + \inputmat u)\big], solved by the Riccati recursion, which converges because the Riccati map contracts on the PSD cone.

They are the same equation — minimize immediate cost plus optimal cost-to-go — under one specialization: linear dynamics and quadratic cost keep v\valuefn^* quadratic, collapsing the infinite-dimensional functional fixed point to the finite matrix P\riccati. Value iteration \leftrightarrow Riccati recursion; the Bellman operator \leftrightarrow the Riccati map; the γ\discount-contraction that gave Chapter 1 its convergence \leftrightarrow the stabilizability/detectability that gives the ARE its unique stabilizing solution. Control theory reached the Riccati equation from the calculus of variations and Pontryagin’s principle Kalman (1960) ; reinforcement learning reached value iteration from the Bellman equation; LQR is where the two derivations meet on one object.

LQG: optimal output feedback, and its fragility

Real systems are noisy and only partially measured: hk+1=Ahk+Buk+wk\statevec_{k+1} = \statemat\statevec_k + \inputmat u_k + w_k, yk=Chk+vky_k = \outputmat\statevec_k + v_k with Gaussian w,vw,v. The linear-quadratic-Gaussian (LQG) problem minimizes the expected quadratic cost. Its solution is the separation principle: run a Kalman filter Kalman (1960) to produce the optimal state estimate h^\hat{\statevec}, then apply the LQR gain to the estimate, u=Kh^u = -\lqrgain\hat{\statevec} — estimator and regulator designed independently and combined.

By W12 duality the filter gain solves the dual Riccati equation, so LQG is two Riccati solves on dual systems.

The catch is robustness. Doyle (1978) — a one-page paper — shows LQG has no guaranteed stability margins: there are LQG designs an arbitrarily small gain perturbation destabilizes.

Proposition 13.1 (LQR margins vs. LQG fragility).

The full-state LQR loop has a guaranteed gain margin [12,)[\tfrac12,\infty) and at least 6060^\circ phase margin: scaling the optimal gain by any β12\beta\geq\tfrac12 leaves AβBK\statemat - \beta\inputmat\lqrgain stable. The LQG (output-feedback) loop has no such guarantee — its gain margin can be made arbitrarily small.

The companion reproduces this: the LQR loop stays stable across a wide gain scaling (β12\beta\geq\tfrac12), while the LQG loop on the same plant is stable only at the nominal β=1\beta = 1 — an arbitrarily small gain error on either side destabilizes it. The lesson is foundational for the rest of the curriculum: optimality on the nominal model does not imply robustness. The separation principle is optimal and fragile at once — the gap that robust control, and later robust/risk-aware RL, exist to close.

What’s next

  • Week 14 (nonlinear control). Lyapunov design and feedback linearization for systems where the linear model is only a local picture. The Lyapunov function of Week 12 becomes a design tool rather than an analysis certificate, and the quadratic value function of this chapter becomes a local approximation to a nonlinear cost-to-go — the entry to nonlinear optimal control and, eventually, model predictive control (Week 15).

Exercises

  1. (Derive) Starting from the Bellman optimality equation with vk+1(h)=hPk+1h\valuefn_{k+1}^*(\statevec) = \statevec^\top \riccati_{k+1}\statevec, complete the square in uu to derive the gain Kk\lqrgain_k and the Riccati recursion for Pk\riccati_k.

    Solution

    Expanding gives u(R+BPk+1B)u+2uBPk+1Ah+h(Q+APk+1A)hu^\top(R + \inputmat^\top \riccati_{k+1}\inputmat)u + 2u^\top \inputmat^\top \riccati_{k+1}\statemat\statevec + \statevec^\top(Q + \statemat^\top \riccati_{k+1}\statemat)\statevec. The uu-Hessian R+BPk+1B0R + \inputmat^\top \riccati_{k+1}\inputmat \succ 0, so the minimizer is u=(R+BPk+1B)1BPk+1Ah=Kkhu^* = -(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat\statevec = -\lqrgain_k\statevec. Back-substitution yields vk=hPkh\valuefn_k^* = \statevec^\top \riccati_k\statevec with Pk=Q+APk+1AAPk+1B(R+BPk+1B)1BPk+1A\riccati_k = Q + \statemat^\top \riccati_{k+1}\statemat - \statemat^\top \riccati_{k+1}\inputmat(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat.

  2. (Compute) Solve the scalar infinite-horizon LQR: A=a\statemat = a, B=b\inputmat = b, Q=qQ = q, R=rR = r (all scalars). Write the DARE for P=p\riccati = p and solve it.

    Solution

    The DARE is p=q+a2pa2b2p2r+b2pp = q + a^2 p - \dfrac{a^2 b^2 p^2}{r + b^2 p}, a quadratic in pp. Clearing the denominator gives b2p2(b2q+(a21)r)pqr=0b^2 p^2 - (b^2 q + (a^2-1)r)\,p - qr = 0; the positive root is the stabilizing p>0p > 0, and K=abpr+b2p\lqrgain = \dfrac{ab\,p}{r + b^2 p}. As a check, with a=0a=0 the state already dies in one step and p=qp = q, K=0\lqrgain = 0.

  3. (Prove) Show the optimal LQR cost from h0\statevec_0 is exactly h0P0h0\statevec_0^\top \riccati_0\statevec_0, and that under the stationary gain the cost-to-go hkPhk\statevec_k^\top \riccati\statevec_k is non-increasing along the closed-loop trajectory.

    Solution

    By Theorem 13.1, v0(h0)=h0P0h0\valuefn_0^*(\statevec_0) = \statevec_0^\top \riccati_0\statevec_0 is the minimum cost. Along the closed loop hk+1=(ABK)hk\statevec_{k+1} = (\statemat - \inputmat\lqrgain)\statevec_k, the DARE rearranges to hkPhkhk+1Phk+1=hk(Q+KRK)hk0\statevec_k^\top \riccati\statevec_k - \statevec_{k+1}^\top \riccati\statevec_{k+1} = \statevec_k^\top(Q + \lqrgain^\top R\lqrgain)\statevec_k \geq 0, so hkPhk\statevec_k^\top \riccati\statevec_k decreases by exactly the stage cost each step — hPh\statevec^\top \riccati\statevec is a Lyapunov function for the optimal closed loop, tying back to Week 12.

  4. (Implement) In the companion, verify the DARE residual is zero, the gain matches control.dlqr, the finite-horizon Riccati converges to the ARE solution as N\horizon grows, and the simulated closed-loop cost equals h0Ph0\statevec_0^\top \riccati\statevec_0.

    Solution

    See experiments/python/week13/test_lqr.py: dare_gain solves the DARE (scipy.linalg.solve_discrete_are) with residual 0\approx 0 and gain agreeing with control.dlqr; the backward recursion’s P0\riccati_0 converges to that P\riccati as N\horizon\to\infty; and the summed stage cost under Kh-\lqrgain\statevec matches h0Ph0\statevec_0^\top \riccati\statevec_0.

  5. (Extend) Reproduce the LQG margin failure. Build the LQR loop and the LQG loop (LQR gain on a Kalman estimate) for the same plant, scale the loop gain by β\beta, and compare the stable range of β\beta.

    Solution

    See the companion’s Doyle example: the full-state loop AβBK\statemat - \beta\inputmat\lqrgain stays stable for all β12\beta\geq\tfrac12 (Prop. 13.1), but the output-feedback closed loop [AβBKLCABKLC]\big[\begin{smallmatrix}\statemat & -\beta\inputmat\lqrgain \\ L\outputmat & \statemat - \inputmat\lqrgain - L\outputmat\end{smallmatrix}\big] is stable only at β=1\beta = 1 — an arbitrarily small perturbation either way destabilizes it (the gain error β\beta multiplies only the control reaching the plant, not the commanded control the filter propagates). Optimality of the estimator-plus-regulator does not transfer to a robustness guarantee.

  6. (Extend) Using Week-12 duality, show the steady-state Kalman filter gain solves the same algebraic Riccati equation as LQR on the dual pair (A,C)(\statemat^\top,\outputmat^\top). When does the separation principle stop being optimal (as opposed to merely stable)?

    Solution

    The filter error covariance solves Σ=AΣAAΣC(CΣC+V)1CΣA+W\Sigma = \statemat\Sigma\statemat^\top - \statemat\Sigma\outputmat^\top(\outputmat\Sigma\outputmat^\top + V)^{-1}\outputmat\Sigma\statemat^\top + W, which is the DARE of Theorem 13.2 with (A,B,Q,R)(A,C,W,V)(\statemat,\inputmat,Q,R)\mapsto(\statemat^\top,\outputmat^\top,W,V). Separation is optimal precisely for linear dynamics, Gaussian noise, and quadratic cost; it loses optimality once the dynamics are nonlinear, the noise non-Gaussian, or the cost non-quadratic — where estimation and control no longer decouple (a recurring theme in model-based RL).

Companion code

The Week-13 companion lives at experiments/python/week13/ (Python, on scipy.linalg + numpy, cross-checked against python-control).

  • lqr.py — the finite-horizon backward Riccati recursion; the infinite-horizon gain via the discrete and continuous algebraic Riccati equations (solve_discrete_are, solve_continuous_are); a closed-loop cost simulator; and the Doyle LQG margin-failure example (LQR gain margin vs. the output-feedback loop’s vanishing margin).
  • test_lqr.py — mathematical-correctness tests: the DARE/CARE residuals vanish; the gain equals control.dlqr / control.lqr; the finite-horizon P0\riccati_0 converges to the ARE solution as N\horizon\to\infty; the simulated optimal cost equals h0Ph0\statevec_0^\top \riccati\statevec_0; the closed loop is Schur/Hurwitz; and the LQR gain margin contains [12,)[\tfrac12,\infty) while the LQG loop is destabilized near β=1\beta = 1.
# LQR/LQG algorithms + correctness tests (scipy + python-control cross-check)
PYTHONPATH=. pytest experiments/python/week13/test_lqr.py -q

# worked finite/infinite-horizon LQR + the Doyle LQG margin demonstration
PYTHONPATH=. python experiments/python/week13/lqr.py --doyle