Linear-Quadratic Regulation: The Exact Dynamic Program
The linear-quadratic regulator as exact dynamic programming: a quadratic value function, the Riccati recursion as Chapter 1's Bellman optimality equation in coordinates, the linear-feedback optimal policy, the infinite-horizon algebraic Riccati equation, and the LQG separation principle — with Doyle's warning that optimal output feedback carries no guaranteed stability margins.
On this page
Linear-Quadratic Regulation: The Exact Dynamic Program
Where we are. Weeks 11–12 built the linear model and the structural properties — stability, controllability, observability — that say what control is possible. Now we put a cost on the model and ask for the best control. The linear-quadratic regulator (LQR) is the one optimal-control problem dynamic programming solves in closed form, and it is exactly Chapter 1’s Bellman optimality equation specialized to linear dynamics and quadratic cost. The “DP bridge” promised since Chapter 1 is paid here: the value function is a quadratic, value iteration becomes the Riccati recursion, and the optimal policy is linear state feedback . Bellman, Riccati, and feedback control turn out to be one calculation.
The LQR problem
For the discrete-time linear system , the finite-horizon LQR problem is to choose minimizing the quadratic cost
with state-cost , control-cost , and terminal cost . The infinite-horizon problem takes with no terminal term, minimizing .
makes every control expensive, so the minimizer is unique and finite; penalizes leaving the origin. This is a Markov decision process (Chapter 1) with a known deterministic linear kernel and a quadratic cost in place of a reward.
LQR is exact dynamic programming
The optimal cost-to-go of Definition 13.1 is the quadratic , where runs the backward Riccati recursion
and the optimal policy is the linear state feedback with time-varying gain
Backward induction on the Bellman optimality equation .
Base. , so .
Step. Assume . Substituting and expanding the quadratic in ,
The Hessian in is (since , ), so the minimizer is the stationary point
Substituting back (completing the square) leaves a pure quadratic in , , whose matrix is exactly the Riccati recursion above. The quadratic form is preserved, closing the induction.
The proof is Chapter 1’s value iteration run on a quadratic ansatz. Every step is one application of the Bellman optimality operator; the minimization is exact (a quadratic in ), not sampled or approximated, because the model is known and the cost is quadratic. The optimal value, the optimal policy, and the optimal cost all fall out of one backward sweep.
Infinite horizon: the algebraic Riccati equation
Run the recursion backward from a far horizon and settles to a constant.
If is stabilizable and is detectable, then as the Riccati iterates converge, , to the unique symmetric positive-semidefinite solution of the discrete algebraic Riccati equation (DARE)
The optimal policy is the stationary feedback with , and the closed loop is asymptotically stable (Schur).
The hypotheses are exactly Week 12’s structural properties: stabilizability (every uncontrollable mode already stable) guarantees a finite-cost policy exists, and detectability (every unobservable-through- mode already stable) guarantees the stabilizing solution is unique. The standard proof shows the Riccati map is a monotone contraction on the positive-semidefinite cone; see Lewis et al. (2012) and Bertsekas (2017) .
Continuous time. The same logic in continuous time replaces the recursion by the Hamilton–Jacobi–Bellman equation , whose Hamiltonian adds the stage cost to the cost-to-go’s rate of change along the dynamics. Its quadratic solution gives the continuous algebraic Riccati equation , with gain and Hurwitz closed loop . The HJB equation is the continuous-time Bellman equation; Chapter 1’s HJB aside lands exactly here. Note the kinship with Week 12: drop the control term and the ARE is the Lyapunov equation — optimal control is stability analysis with a cost-shaping term.
The bridge to Chapter 1
This is the chapter the curriculum has been pointing at. Lay the two equations side by side:
- Chapter 1 (general MDP). , solved by value iteration, which converges because is a -contraction.
- Chapter 13 (linear-quadratic). , solved by the Riccati recursion, which converges because the Riccati map contracts on the PSD cone.
They are the same equation — minimize immediate cost plus optimal cost-to-go — under one specialization: linear dynamics and quadratic cost keep quadratic, collapsing the infinite-dimensional functional fixed point to the finite matrix . Value iteration Riccati recursion; the Bellman operator the Riccati map; the -contraction that gave Chapter 1 its convergence the stabilizability/detectability that gives the ARE its unique stabilizing solution. Control theory reached the Riccati equation from the calculus of variations and Pontryagin’s principle Kalman (1960) ; reinforcement learning reached value iteration from the Bellman equation; LQR is where the two derivations meet on one object.
LQG: optimal output feedback, and its fragility
Real systems are noisy and only partially measured: , with Gaussian . The linear-quadratic-Gaussian (LQG) problem minimizes the expected quadratic cost. Its solution is the separation principle: run a Kalman filter Kalman (1960) to produce the optimal state estimate , then apply the LQR gain to the estimate, — estimator and regulator designed independently and combined. By W12 duality the filter gain solves the dual Riccati equation, so LQG is two Riccati solves on dual systems.
The catch is robustness. Doyle (1978) — a one-page paper — shows LQG has no guaranteed stability margins: there are LQG designs an arbitrarily small gain perturbation destabilizes.
The full-state LQR loop has a guaranteed gain margin and at least phase margin: scaling the optimal gain by any leaves stable. The LQG (output-feedback) loop has no such guarantee — its gain margin can be made arbitrarily small.
The companion reproduces this: the LQR loop stays stable across a wide gain scaling (), while the LQG loop on the same plant is stable only at the nominal — an arbitrarily small gain error on either side destabilizes it. The lesson is foundational for the rest of the curriculum: optimality on the nominal model does not imply robustness. The separation principle is optimal and fragile at once — the gap that robust control, and later robust/risk-aware RL, exist to close.
What’s next
- Week 14 (nonlinear control). Lyapunov design and feedback linearization for systems where the linear model is only a local picture. The Lyapunov function of Week 12 becomes a design tool rather than an analysis certificate, and the quadratic value function of this chapter becomes a local approximation to a nonlinear cost-to-go — the entry to nonlinear optimal control and, eventually, model predictive control (Week 15).
Exercises
-
(Derive) Starting from the Bellman optimality equation with , complete the square in to derive the gain and the Riccati recursion for .
Solution
Expanding gives . The -Hessian , so the minimizer is . Back-substitution yields with .
-
(Compute) Solve the scalar infinite-horizon LQR: , , , (all scalars). Write the DARE for and solve it.
Solution
The DARE is , a quadratic in . Clearing the denominator gives ; the positive root is the stabilizing , and . As a check, with the state already dies in one step and , .
-
(Prove) Show the optimal LQR cost from is exactly , and that under the stationary gain the cost-to-go is non-increasing along the closed-loop trajectory.
Solution
By Theorem 13.1, is the minimum cost. Along the closed loop , the DARE rearranges to , so decreases by exactly the stage cost each step — is a Lyapunov function for the optimal closed loop, tying back to Week 12.
-
(Implement) In the companion, verify the DARE residual is zero, the gain matches
control.dlqr, the finite-horizon Riccati converges to the ARE solution as grows, and the simulated closed-loop cost equals .Solution
See
experiments/python/week13/test_lqr.py:dare_gainsolves the DARE (scipy.linalg.solve_discrete_are) with residual and gain agreeing withcontrol.dlqr; the backward recursion’s converges to that as ; and the summed stage cost under matches . -
(Extend) Reproduce the LQG margin failure. Build the LQR loop and the LQG loop (LQR gain on a Kalman estimate) for the same plant, scale the loop gain by , and compare the stable range of .
Solution
See the companion’s Doyle example: the full-state loop stays stable for all (Prop. 13.1), but the output-feedback closed loop is stable only at — an arbitrarily small perturbation either way destabilizes it (the gain error multiplies only the control reaching the plant, not the commanded control the filter propagates). Optimality of the estimator-plus-regulator does not transfer to a robustness guarantee.
-
(Extend) Using Week-12 duality, show the steady-state Kalman filter gain solves the same algebraic Riccati equation as LQR on the dual pair . When does the separation principle stop being optimal (as opposed to merely stable)?
Solution
The filter error covariance solves , which is the DARE of Theorem 13.2 with . Separation is optimal precisely for linear dynamics, Gaussian noise, and quadratic cost; it loses optimality once the dynamics are nonlinear, the noise non-Gaussian, or the cost non-quadratic — where estimation and control no longer decouple (a recurring theme in model-based RL).
Companion code
The Week-13 companion lives at experiments/python/week13/ (Python, on scipy.linalg +
numpy, cross-checked against python-control).
lqr.py— the finite-horizon backward Riccati recursion; the infinite-horizon gain via the discrete and continuous algebraic Riccati equations (solve_discrete_are,solve_continuous_are); a closed-loop cost simulator; and the Doyle LQG margin-failure example (LQR gain margin vs. the output-feedback loop’s vanishing margin).test_lqr.py— mathematical-correctness tests: the DARE/CARE residuals vanish; the gain equalscontrol.dlqr/control.lqr; the finite-horizon converges to the ARE solution as ; the simulated optimal cost equals ; the closed loop is Schur/Hurwitz; and the LQR gain margin contains while the LQG loop is destabilized near .
# LQR/LQG algorithms + correctness tests (scipy + python-control cross-check)
PYTHONPATH=. pytest experiments/python/week13/test_lqr.py -q
# worked finite/infinite-horizon LQR + the Doyle LQG margin demonstration
PYTHONPATH=. python experiments/python/week13/lqr.py --doyle