Part: control Week 13 Published lqr.py test_lqr.py

Linear-Quadratic Regulation: The Exact Dynamic Program

The linear-quadratic regulator as exact dynamic programming: a quadratic value function, the Riccati recursion as Chapter 1's Bellman optimality equation in coordinates, the linear-feedback optimal policy, the infinite-horizon algebraic Riccati equation, and the LQG separation principle — with Doyle's warning that optimal output feedback carries no guaranteed stability margins.

On this page

The LQR problem
LQR is exact dynamic programming
Infinite horizon: the algebraic Riccati equation
The bridge to Chapter 1
LQG: optimal output feedback, and its fragility
What’s next
Exercises
Companion code

Linear-Quadratic Regulation: The Exact Dynamic Program

Where we are. Weeks 11–12 built the linear model and the structural properties — stability, controllability, observability — that say what control is possible. Now we put a cost on the model and ask for the best control. The linear-quadratic regulator (LQR) is the one optimal-control problem dynamic programming solves in closed form, and it is exactly Chapter 1’s Bellman optimality equation specialized to linear dynamics and quadratic cost. The “DP bridge” promised since Chapter 1 is paid here: the value function is a quadratic, value iteration becomes the Riccati recursion, and the optimal policy is linear state feedback $u = -\lqrgain\statevec$ . Bellman, Riccati, and feedback control turn out to be one calculation.

Chapter 13 — at a glance

Goal. Derive LQR as exact dynamic programming — quadratic cost-to-go, the Riccati recursion, and the gain $\lqrgain$ ; pass to the infinite-horizon algebraic Riccati equation; and see why LQG (LQR on a Kalman-filter estimate) forfeits LQR’s stability margins.

Reading time. ~55 minutes; ~90 with the proofs and exercises.

Key insight — the DP bridge (the payoff). The Riccati equation is the Bellman optimality equation for a linear-quadratic problem. Chapter 1 solved $\optvaluefn = \bellmanopt\optvaluefn$ by iterating a $\discount$ -contraction on the whole value function; here the value stays quadratic, $\valuefn^*(\statevec) = \statevec^\top \riccati\statevec$ , so the functional fixed point collapses to a matrix fixed point — the Riccati map on $\riccati$ . Solve it once and W12 duality hands you the optimal state estimator (the Kalman filter) for free. This is the Rosetta Stone of the curriculum.

The LQR problem

Definition 13.1 (Linear-quadratic regulator).

For the discrete-time linear system $\statevec_{k+1} = \statemat\statevec_k + \inputmat u_k$ , the finite-horizon LQR problem is to choose $u_0,\dots,u_{\horizon-1}$ minimizing the quadratic cost

\costtogo = \sum_{k=0}^{\horizon-1}\big(\statevec_k^\top Q\,\statevec_k + u_k^\top R\,u_k\big) + \statevec_\horizon^\top Q_\horizon\,\statevec_\horizon,

with state-cost $Q \succeq 0$ , control-cost $R \succ 0$ , and terminal cost $Q_\horizon \succeq 0$ . The infinite-horizon problem takes $\horizon\to\infty$ with no terminal term, minimizing $\sum_{k=0}^{\infty}(\statevec_k^\top Q\,\statevec_k + u_k^\top R\,u_k)$ .

$R \succ 0$ makes every control expensive, so the minimizer is unique and finite; $Q \succeq 0$ penalizes leaving the origin. This is a Markov decision process (Chapter 1) with a known deterministic linear kernel and a quadratic cost in place of a reward.

LQR is exact dynamic programming

Theorem 13.1 (LQR via dynamic programming).

The optimal cost-to-go of Definition 13.1 is the quadratic $\valuefn_k^*(\statevec) = \statevec^\top \riccati_k\statevec$ , where $\riccati_k$ runs the backward Riccati recursion

\riccati_\horizon = Q_\horizon, \qquad \riccati_k = Q + \statemat^\top \riccati_{k+1}\statemat - \statemat^\top \riccati_{k+1}\inputmat\,(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat,

and the optimal policy is the linear state feedback $u_k = -\lqrgain_k\statevec_k$ with time-varying gain

\lqrgain_k = (R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat.

Proof.

Backward induction on the Bellman optimality equation $\valuefn_k^*(\statevec) = \min_u\big[\statevec^\top Q\statevec + u^\top R u + \valuefn_{k+1}^*(\statemat\statevec + \inputmat u)\big]$ .

Base. $\valuefn_\horizon^*(\statevec) = \statevec^\top Q_\horizon\statevec$ , so $\riccati_\horizon = Q_\horizon$ .

Step. Assume $\valuefn_{k+1}^*(\statevec) = \statevec^\top \riccati_{k+1}\statevec$ . Substituting and expanding the quadratic in $u$ ,

\begin{aligned} \statevec^\top Q\statevec + u^\top R u + (\statemat\statevec + \inputmat u)^\top \riccati_{k+1}(\statemat\statevec + \inputmat u) &= u^\top\!\underbrace{(R + \inputmat^\top \riccati_{k+1}\inputmat)}_{\textstyle \succ 0}\,u + 2\,u^\top \inputmat^\top \riccati_{k+1}\statemat\,\statevec \\ &\quad + \statevec^\top(Q + \statemat^\top \riccati_{k+1}\statemat)\statevec . \end{aligned}

The Hessian in $u$ is $R + \inputmat^\top \riccati_{k+1}\inputmat \succ 0$ (since $R\succ0$ , $\riccati_{k+1}\succeq0$ ), so the minimizer is the stationary point

u^* = -(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat\,\statevec = -\lqrgain_k\statevec .

Substituting $u^*$ back (completing the square) leaves a pure quadratic in $\statevec$ , $\valuefn_k^*(\statevec) = \statevec^\top \riccati_k\statevec$ , whose matrix is exactly the Riccati recursion above. The quadratic form is preserved, closing the induction. $\qquad\blacksquare$

The proof is Chapter 1’s value iteration run on a quadratic ansatz.

Every step is one application of the Bellman optimality operator; the minimization is exact (a quadratic in

u

), not sampled or approximated, because the model is known and the cost is quadratic. The optimal value, the optimal policy, and the optimal cost

\valuefn_0^*(\statevec_0) = \statevec_0^\top \riccati_0\statevec_0

all fall out of one backward sweep.

Infinite horizon: the algebraic Riccati equation

Run the recursion backward from a far horizon and $\riccati_k$ settles to a constant.

Theorem 13.2 (Infinite-horizon LQR).

If $(\statemat,\inputmat)$ is stabilizable and $(\statemat, Q^{1/2})$ is detectable, then as $\horizon\to\infty$ the Riccati iterates converge, $\riccati_k\to\riccati$ , to the unique symmetric positive-semidefinite solution of the discrete algebraic Riccati equation (DARE)

\riccati = Q + \statemat^\top \riccati\,\statemat - \statemat^\top \riccati\inputmat\,(R + \inputmat^\top \riccati\inputmat)^{-1}\inputmat^\top \riccati\,\statemat .

The optimal policy is the stationary feedback $u = -\lqrgain\statevec$ with $\lqrgain = (R + \inputmat^\top \riccati\inputmat)^{-1}\inputmat^\top \riccati\statemat$ , and the closed loop $\statemat - \inputmat\lqrgain$ is asymptotically stable (Schur).

The hypotheses are exactly Week 12’s structural properties: stabilizability (every uncontrollable mode already stable) guarantees a finite-cost policy exists, and detectability (every unobservable-through- $Q$ mode already stable) guarantees the stabilizing solution is unique.

The standard proof shows the Riccati map is a monotone contraction on the positive-semidefinite cone; see Lewis et al. (2012) and Bertsekas (2017) .

Continuous time. The same logic in continuous time replaces the recursion by the Hamilton–Jacobi–Bellman equation $\min_u \hamiltonian = 0$ , whose Hamiltonian $\hamiltonian = \statevec^\top Q\statevec + u^\top R u + (\nabla_{\statevec}\valuefn^*)^\top(\statemat\statevec + \inputmat u)$ adds the stage cost to the cost-to-go’s rate of change along the dynamics. Its quadratic solution gives the continuous algebraic Riccati equation $\statemat^\top \riccati + \riccati\statemat - \riccati\inputmat R^{-1}\inputmat^\top \riccati + Q = 0$ , with gain $\lqrgain = R^{-1}\inputmat^\top \riccati$ and Hurwitz closed loop $\statemat - \inputmat\lqrgain$ . The HJB equation is the continuous-time Bellman equation; Chapter 1’s HJB aside lands exactly here. Note the kinship with Week 12: drop the control term $\riccati\inputmat R^{-1}\inputmat^\top \riccati$ and the ARE is the Lyapunov equation — optimal control is stability analysis with a cost-shaping term.

The bridge to Chapter 1

This is the chapter the curriculum has been pointing at. Lay the two equations side by side:

Chapter 1 (general MDP). $\optvaluefn(s) = \max_a\big[\reward(s,a) + \discount\sum_{s'}\transition(s'\mid s,a)\,\optvaluefn(s')\big]$ , solved by value iteration, which converges because $\bellmanopt$ is a $\discount$ -contraction.
Chapter 13 (linear-quadratic). $\valuefn^*(\statevec) = \min_u\big[\statevec^\top Q\statevec + u^\top R u + \valuefn^*(\statemat\statevec + \inputmat u)\big]$ , solved by the Riccati recursion, which converges because the Riccati map contracts on the PSD cone.

They are the same equation — minimize immediate cost plus optimal cost-to-go — under one specialization: linear dynamics and quadratic cost keep $\valuefn^*$ quadratic, collapsing the infinite-dimensional functional fixed point to the finite matrix $\riccati$ . Value iteration $\leftrightarrow$ Riccati recursion; the Bellman operator $\leftrightarrow$ the Riccati map; the $\discount$ -contraction that gave Chapter 1 its convergence $\leftrightarrow$ the stabilizability/detectability that gives the ARE its unique stabilizing solution. Control theory reached the Riccati equation from the calculus of variations and Pontryagin’s principle Kalman (1960) ; reinforcement learning reached value iteration from the Bellman equation; LQR is where the two derivations meet on one object.

LQG: optimal output feedback, and its fragility

Real systems are noisy and only partially measured: $\statevec_{k+1} = \statemat\statevec_k + \inputmat u_k + w_k$ , $y_k = \outputmat\statevec_k + v_k$ with Gaussian $w,v$ . The linear-quadratic-Gaussian (LQG) problem minimizes the expected quadratic cost. Its solution is the separation principle: run a Kalman filter Kalman (1960) to produce the optimal state estimate $\hat{\statevec}$ , then apply the LQR gain to the estimate, $u = -\lqrgain\hat{\statevec}$ — estimator and regulator designed independently and combined.

By W12 duality the filter gain solves the dual Riccati equation, so LQG is two Riccati solves on dual systems.

The catch is robustness. Doyle (1978) — a one-page paper — shows LQG has no guaranteed stability margins: there are LQG designs an arbitrarily small gain perturbation destabilizes.

Proposition 13.1 (LQR margins vs. LQG fragility).

The full-state LQR loop has a guaranteed gain margin $[\tfrac12,\infty)$ and at least $60^\circ$ phase margin: scaling the optimal gain by any $\beta\geq\tfrac12$ leaves $\statemat - \beta\inputmat\lqrgain$ stable. The LQG (output-feedback) loop has no such guarantee — its gain margin can be made arbitrarily small.

The companion reproduces this: the LQR loop stays stable across a wide gain scaling ( $\beta\geq\tfrac12$ ), while the LQG loop on the same plant is stable only at the nominal $\beta = 1$ — an arbitrarily small gain error on either side destabilizes it. The lesson is foundational for the rest of the curriculum: optimality on the nominal model does not imply robustness. The separation principle is optimal and fragile at once — the gap that robust control, and later robust/risk-aware RL, exist to close.

What’s next

Week 14 (nonlinear control). Lyapunov design and feedback linearization for systems where the linear model is only a local picture. The Lyapunov function of Week 12 becomes a design tool rather than an analysis certificate, and the quadratic value function of this chapter becomes a local approximation to a nonlinear cost-to-go — the entry to nonlinear optimal control and, eventually, model predictive control (Week 15).

Exercises

(Derive) Starting from the Bellman optimality equation with $\valuefn_{k+1}^*(\statevec) = \statevec^\top \riccati_{k+1}\statevec$ , complete the square in $u$ to derive the gain $\lqrgain_k$ and the Riccati recursion for $\riccati_k$ .

Solution
Expanding gives $u^\top(R + \inputmat^\top \riccati_{k+1}\inputmat)u + 2u^\top \inputmat^\top \riccati_{k+1}\statemat\statevec + \statevec^\top(Q + \statemat^\top \riccati_{k+1}\statemat)\statevec$ . The $u$ -Hessian $R + \inputmat^\top \riccati_{k+1}\inputmat \succ 0$ , so the minimizer is $u^* = -(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat\statevec = -\lqrgain_k\statevec$ . Back-substitution yields $\valuefn_k^* = \statevec^\top \riccati_k\statevec$ with $\riccati_k = Q + \statemat^\top \riccati_{k+1}\statemat - \statemat^\top \riccati_{k+1}\inputmat(R + \inputmat^\top \riccati_{k+1}\inputmat)^{-1}\inputmat^\top \riccati_{k+1}\statemat$ .
(Compute) Solve the scalar infinite-horizon LQR: $\statemat = a$ , $\inputmat = b$ , $Q = q$ , $R = r$ (all scalars). Write the DARE for $\riccati = p$ and solve it.

Solution
The DARE is $p = q + a^2 p - \dfrac{a^2 b^2 p^2}{r + b^2 p}$ , a quadratic in $p$ . Clearing the denominator gives $b^2 p^2 - (b^2 q + (a^2-1)r)\,p - qr = 0$ ; the positive root is the stabilizing $p > 0$ , and $\lqrgain = \dfrac{ab\,p}{r + b^2 p}$ . As a check, with $a=0$ the state already dies in one step and $p = q$ , $\lqrgain = 0$ .
(Prove) Show the optimal LQR cost from $\statevec_0$ is exactly $\statevec_0^\top \riccati_0\statevec_0$ , and that under the stationary gain the cost-to-go $\statevec_k^\top \riccati\statevec_k$ is non-increasing along the closed-loop trajectory.

Solution
By Theorem 13.1, $\valuefn_0^*(\statevec_0) = \statevec_0^\top \riccati_0\statevec_0$ is the minimum cost. Along the closed loop $\statevec_{k+1} = (\statemat - \inputmat\lqrgain)\statevec_k$ , the DARE rearranges to $\statevec_k^\top \riccati\statevec_k - \statevec_{k+1}^\top \riccati\statevec_{k+1} = \statevec_k^\top(Q + \lqrgain^\top R\lqrgain)\statevec_k \geq 0$ , so $\statevec_k^\top \riccati\statevec_k$ decreases by exactly the stage cost each step — $\statevec^\top \riccati\statevec$ is a Lyapunov function for the optimal closed loop, tying back to Week 12.
(Implement) In the companion, verify the DARE residual is zero, the gain matches control.dlqr, the finite-horizon Riccati converges to the ARE solution as $\horizon$ grows, and the simulated closed-loop cost equals $\statevec_0^\top \riccati\statevec_0$ .

Solution
See experiments/python/week13/test_lqr.py: dare_gain solves the DARE (scipy.linalg.solve_discrete_are) with residual $\approx 0$ and gain agreeing with control.dlqr; the backward recursion’s $\riccati_0$ converges to that $\riccati$ as $\horizon\to\infty$ ; and the summed stage cost under $-\lqrgain\statevec$ matches $\statevec_0^\top \riccati\statevec_0$ .
(Extend) Reproduce the LQG margin failure. Build the LQR loop and the LQG loop (LQR gain on a Kalman estimate) for the same plant, scale the loop gain by $\beta$ , and compare the stable range of $\beta$ .

Solution
See the companion’s Doyle example: the full-state loop $\statemat - \beta\inputmat\lqrgain$ stays stable for all $\beta\geq\tfrac12$ (Prop. 13.1), but the output-feedback closed loop $\big[\begin{smallmatrix}\statemat & -\beta\inputmat\lqrgain \\ L\outputmat & \statemat - \inputmat\lqrgain - L\outputmat\end{smallmatrix}\big]$ is stable only at $\beta = 1$ — an arbitrarily small perturbation either way destabilizes it (the gain error $\beta$ multiplies only the control reaching the plant, not the commanded control the filter propagates). Optimality of the estimator-plus-regulator does not transfer to a robustness guarantee.
(Extend) Using Week-12 duality, show the steady-state Kalman filter gain solves the same algebraic Riccati equation as LQR on the dual pair $(\statemat^\top,\outputmat^\top)$ . When does the separation principle stop being optimal (as opposed to merely stable)?

Solution
The filter error covariance solves $\Sigma = \statemat\Sigma\statemat^\top - \statemat\Sigma\outputmat^\top(\outputmat\Sigma\outputmat^\top + V)^{-1}\outputmat\Sigma\statemat^\top + W$ , which is the DARE of Theorem 13.2 with $(\statemat,\inputmat,Q,R)\mapsto(\statemat^\top,\outputmat^\top,W,V)$ . Separation is optimal precisely for linear dynamics, Gaussian noise, and quadratic cost; it loses optimality once the dynamics are nonlinear, the noise non-Gaussian, or the cost non-quadratic — where estimation and control no longer decouple (a recurring theme in model-based RL).

Companion code

The Week-13 companion lives at experiments/python/week13/ (Python, on scipy.linalg + numpy, cross-checked against python-control).

lqr.py — the finite-horizon backward Riccati recursion; the infinite-horizon gain via the discrete and continuous algebraic Riccati equations (solve_discrete_are, solve_continuous_are); a closed-loop cost simulator; and the Doyle LQG margin-failure example (LQR gain margin vs. the output-feedback loop’s vanishing margin).
test_lqr.py — mathematical-correctness tests: the DARE/CARE residuals vanish; the gain equals control.dlqr / control.lqr; the finite-horizon $\riccati_0$ converges to the ARE solution as $\horizon\to\infty$ ; the simulated optimal cost equals $\statevec_0^\top \riccati\statevec_0$ ; the closed loop is Schur/Hurwitz; and the LQR gain margin contains $[\tfrac12,\infty)$ while the LQG loop is destabilized near $\beta = 1$ .

# LQR/LQG algorithms + correctness tests (scipy + python-control cross-check)
PYTHONPATH=. pytest experiments/python/week13/test_lqr.py -q

# worked finite/infinite-horizon LQR + the Doyle LQG margin demonstration
PYTHONPATH=. python experiments/python/week13/lqr.py --doyle