Part: control Week 14 Published nonlinear.py test_nonlinear.py

Nonlinear Control: Lyapunov Design, Feedback Linearization, and Sliding Modes

When the plant is nonlinear, eigenvalues and Riccati equations describe only a local picture. This chapter builds the tools that replace them: Lyapunov's direct method and LaSalle's invariance principle as global stability certificates, feedback linearization that cancels the nonlinearity by coordinate change, sliding-mode control that enforces a surface in finite time and is robust to matched uncertainty, and input-to-state stability and backstepping as the constructive bridge to robust and adaptive design — with the Lyapunov function read as the control-theoretic cousin of the reinforcement-learning value function.

On this page
  1. Why the linear tools run out
  2. Lyapunov’s direct method
  3. Feedback linearization
  4. Sliding-mode control
  5. Input-to-state stability and backstepping
  6. Nonlinear control versus deep RL
  7. What’s next
  8. Exercises
  9. Companion code

Nonlinear Control: Lyapunov Design, Feedback Linearization, and Sliding Modes

Where we are. Weeks 11–13 lived in the linear world: a state-space model, the structural tests of controllability and observability, and the linear-quadratic regulator that solved the optimal-control problem in closed form. Every one of those tools is, at bottom, an eigenvalue statement — stability is the spectrum of A\statemat, the LQR closed loop is Schur because the Riccati matrix makes it so. But the linear model is only the tangent picture at one operating point. A pendulum, a quadrotor, a robot arm, a power grid: linearize and you get a local certificate that says nothing about the basin of attraction, the swing-up, or behavior far from equilibrium. This chapter asks the Week-14 question: what replaces eigenvalues and Riccati equations when the plant is genuinely nonlinear? The answer is a shift from spectra to energy — from “where are the poles” to “does a scalar certificate decrease along every trajectory.”

Why the linear tools run out

A nonlinear system h˙=f(h,u)\dot{\statevec} = f(\statevec, u), f(0,0)=0f(0,0)=0, linearized at the origin gives h˙Ah+Bu\dot{\statevec} \approx \statemat\statevec + \inputmat u with A=f/h\statemat = \partial f/\partial\statevec. Lyapunov’s indirect method says: if A\statemat (closed-loop) is Hurwitz, the origin is locally asymptotically stable — and that is all it says. It is silent on how large the basin is, blind to multiple equilibria and limit cycles, and useless when the linearization is marginal (eigenvalues on the imaginary axis), which is exactly when nonlinear terms decide stability. The pendulum makes this concrete: about the hanging equilibrium the linearization is a lightly damped oscillator, but the global behavior — every release angle spiraling to rest — is a nonlinear, energy statement the eigenvalues cannot certify. We need a tool that sees the whole state space at once.

Lyapunov’s direct method

The idea is mechanical-energy made abstract. Find a scalar V(h)0\lyap(\statevec) \geq 0 that is zero only at the equilibrium and decreases along trajectories; then trajectories cannot escape its sublevel sets, and if the decrease is strict they slide down to the equilibrium. No solution of the differential equation is required — the certificate is checked by differentiation alone.

Definition 14.1 (Stability of an equilibrium).

The origin of h˙=f(h)\dot{\statevec} = f(\statevec), f(0)=0f(0)=0, is stable (in the sense of Lyapunov) if for every ε>0\varepsilon > 0 there is a δ>0\delta > 0 such that h(0)<δ\norm{\statevec(0)} < \delta implies h(t)<ε\norm{\statevec(t)} < \varepsilon for all t0t \geq 0; asymptotically stable if it is stable and h(t)0\statevec(t) \to 0 for all h(0)\statevec(0) near the origin; and globally asymptotically stable if that holds for every h(0)\statevec(0).

Theorem 14.1 (Lyapunov's direct method).

Let V:DR\lyap : \mathcal{D} \to \mathbb{R} be continuously differentiable on a neighborhood D\mathcal{D} of the origin, positive definite (V(0)=0\lyap(0)=0 and V(h)>0\lyap(\statevec)>0 for h0\statevec \neq 0), with derivative along the flow V˙(h)=V(h)f(h)\dot{\lyap}(\statevec) = \nabla\lyap(\statevec)^\top f(\statevec). If V˙(h)0\dot{\lyap}(\statevec) \leq 0 on D\mathcal{D} the origin is stable; if V˙(h)<0\dot{\lyap}(\statevec) < 0 for h0\statevec \neq 0 it is asymptotically stable; and if additionally V\lyap is radially unbounded (V(h)\lyap(\statevec)\to\infty as h\norm{\statevec}\to\infty) it is globally asymptotically stable.

Proof.

Fix ε>0\varepsilon>0 and a ball BεDB_\varepsilon \subset \mathcal{D}. Let m=minh=εV(h)>0m = \min_{\norm{\statevec}=\varepsilon}\lyap(\statevec) > 0 by positive definiteness, and choose δ\delta so that V(h)<m\lyap(\statevec) < m on h<δ\norm{\statevec}<\delta. Since V˙0\dot{\lyap}\leq 0, V(h(t))\lyap(\statevec(t)) is non-increasing, so a trajectory starting inside h<δ\norm{\statevec}<\delta has V(h(t))<m\lyap(\statevec(t)) < m for all tt and can never reach the shell h=ε\norm{\statevec}=\varepsilon — stability. If V˙<0\dot{\lyap}<0 strictly, V(h(t))\lyap(\statevec(t)) is a decreasing function bounded below by 00, hence convergent; its limit must be a point where V˙=0\dot{\lyap}=0, which is only the origin — asymptotic stability. Radial unboundedness makes every sublevel set compact, so the argument is global. \qquad\blacksquare

The pendulum is the worked example.

Take the energy V(ϕ,ϕ˙)=12ϕ˙2+(g/)(1cosϕ)\lyap(\phi,\dot\phi) = \tfrac12\dot\phi^2 + (g/\ell)(1-\cos\phi), positive definite about the hanging equilibrium. Along the damped dynamics its rate is V˙=dϕ˙20\dot{\lyap} = -d\,\dot\phi^2 \leq 0negative semidefinite, not definite, because it vanishes on the whole line ϕ˙=0\dot\phi = 0, not just at the origin. Theorem 14.1 then gives only stability, not asymptotic stability, even though we know every trajectory decays. The gap is closed by an invariance argument Khalil (2002) .

Theorem 14.2 (LaSalle's invariance principle).

Let Ω\Omega be a compact set, positively invariant under h˙=f(h)\dot{\statevec}=f(\statevec), and let V\lyap be continuously differentiable with V˙0\dot{\lyap}\leq 0 on Ω\Omega. Then every trajectory starting in Ω\Omega converges to the largest invariant set contained in {hΩ:V˙(h)=0}\{\statevec \in \Omega : \dot{\lyap}(\statevec)=0\}.

For the damped pendulum {V˙=0}\{\dot{\lyap}=0\} is {ϕ˙=0}\{\dot\phi=0\}; the only complete trajectory that stays there is the equilibrium (if ϕ˙0\dot\phi\equiv 0 then ϕ¨0\ddot\phi\equiv 0 forces sinϕ=0\sin\phi=0), so LaSalle upgrades stability to asymptotic stability with a semidefinite V˙\dot{\lyap}. The companion confirms it: energy decreases monotonically and the state converges to the origin.

The bridge to Chapters 1 and 13. A Lyapunov function is a value function read backwards. Chapter 1’s vπ\valuefn_\policy satisfies the Bellman equation and decreases in expectation under an improving policy; a Lyapunov V\lyap decreases along every deterministic trajectory, V˙<0\dot{\lyap}<0 — the autonomous, worst-case analog of “the value goes down.” Chapter 13 fused the two: there we proved hPh\statevec^\top\riccati\statevec drops by exactly the stage cost each step, so the optimal cost-to-go is a Lyapunov function. The difference is where the certificate comes from. Optimal control solves the whole Hamilton–Jacobi–Bellman problem and gets V\lyap as a byproduct; Lyapunov design guesses V\lyap (often a physical energy) and only checks a derivative — cheaper, structure-exploiting, and not tied to optimality.

Feedback linearization

Lyapunov’s method certifies; feedback linearization constructs a controller by canceling the nonlinearity outright. For an input-affine system h˙=f(h)+g(h)u\dot{\statevec} = f(\statevec) + g(\statevec)u with output y=h(h)y = h(\statevec), differentiate yy until the input appears.

Definition 14.2 (Relative degree).

The system has relative degree rr at h0\statevec_0 if LgLfkh(h)=0\liederiv_g \liederiv_f^{k}h(\statevec)=0 for k<r1k < r-1 near h0\statevec_0 and LgLfr1h(h0)0\liederiv_g \liederiv_f^{r-1}h(\statevec_0)\neq 0, where the Lie derivative Lfh=hf\liederiv_f h = \nabla h^\top f is the rate of change of hh along ff. Equivalently, rr is the number of differentiations of yy before the input uu appears explicitly.

When the relative degree equals the state dimension, the change of coordinates (h,Lfh,,Lfr1h)(\,h, \liederiv_f h, \dots, \liederiv_f^{r-1}h\,) turns the system into a chain of integrators, and the control

u=1LgLfr1h(h)(Lfrh(h)+v)u = \frac{1}{\liederiv_g \liederiv_f^{r-1}h(\statevec)}\big(-\liederiv_f^{r}h(\statevec) + v\big)

makes the input–output map exactly linear, y(r)=vy^{(r)} = v — then place poles with a linear vv Sastry (1999) . The pendulum is the textbook case of computed torque: with θ¨=asinθdθ˙+bu\ddot\theta = a\sin\theta - d\dot\theta + b\,u, choosing u=1b(asinθ+dθ˙+v)u = \tfrac1b(-a\sin\theta + d\dot\theta + v) cancels gravity and damping, leaving θ¨=v\ddot\theta = v; then v=k1θk2θ˙v = -k_1\theta - k_2\dot\theta gives a chosen second-order linear closed loop Slotine & Li (1991) .

Proposition 14.1 (Computed-torque exactness).

Under the computed-torque law above, the nonlinear closed loop equals the linear system h˙=[01k1k2]h\dot{\statevec} = \big[\begin{smallmatrix}0 & 1\\ -k_1 & -k_2\end{smallmatrix}\big]\statevec exactly, with closed-loop poles the roots of λ2+k2λ+k1\lambda^2 + k_2\lambda + k_1. Stability is by design (choose k1,k2>0k_1,k_2>0), not by linearization.

The companion integrates the true nonlinear plant under this law and the target linear system from the same initial state and finds the trajectories identical to numerical precision — the nonlinearity is gone, not merely small. The cost is honesty about its price: feedback linearization needs an accurate model (it cancels exact terms), it can demand large control authority, and it is only valid where the relative degree is well-defined and the internal dynamics (the unobserved part when r<nr < n) are stable. Cancel a nonlinearity you do not know precisely and the cancellation leaves a residual — which motivates a method that does not depend on exact cancellation.

Sliding-mode control

Sliding-mode control trades smoothness for robustness. Instead of canceling the dynamics, it forces the state onto a designer-chosen surface and keeps it there despite model error, as long as the uncertainty enters through the same channel as the control (matched uncertainty).

Proposition 14.2 (Finite-time reaching).

Let s(h)s(\statevec) define a sliding surface s=0s=0 on which the reduced dynamics are stable, and suppose the control enforces the reaching law s˙=ηsign(s)\dot s = -\eta\,\mathrm{sign}(s) with η>0\eta>0. Then ddt12s2=ηsη2(12s2)1/2\tfrac{d}{dt}\tfrac12 s^2 = -\eta\,\abs{s} \leq -\eta\sqrt{2}\,\big(\tfrac12 s^2\big)^{1/2}, so s\abs{s} reaches 00 in finite time bounded by s(0)/η\abs{s(0)}/\eta, after which the trajectory stays on s=0s=0 and the reduced dynamics carry it to the origin.

For the pendulum, take s=θ˙+λθs = \dot\theta + \lambda\theta (λ>0\lambda>0); on s=0s=0 the reduced dynamics are θ˙=λθ\dot\theta = -\lambda\theta, which decays. The control u=1b(asinθ+(dλ)θ˙ηsat(s/Φ))u = \tfrac1b\big(-a\sin\theta + (d-\lambda)\dot\theta - \eta\,\mathrm{sat}(s/\Phi)\big) drives ss into a boundary layer of width Φ\Phi.

The robustness is the selling point: a wrong gravity estimate still drives s0s\to 0, because the gravity term enters through the same input channel as uu and is dominated by a large enough η\eta Slotine & Li (1991) . The companion verifies all three claims — finite-time reaching within the s(0)/η\abs{s(0)}/\eta bound, the surface maintained inside the boundary layer thereafter, and convergence under a deliberately mismatched model.

Lyapunov is doing the work both times. Computed torque imposes a stable linear Lyapunov function; sliding mode uses 12s2\tfrac12 s^2 as a Lyapunov function for the surface. Each design is a recipe for a certificate — which is exactly what the next two ideas systematize.

Input-to-state stability and backstepping

Real systems have disturbances. Input-to-state stability (ISS) is the right generalization of asymptotic stability to forced systems, and it is stated with comparison functions: a class-K\classK function is continuous, strictly increasing, and zero at zero.

Definition 14.3 (Input-to-state stability).

The system h˙=f(h,u)\dot{\statevec} = f(\statevec, u) is input-to-state stable if there exist a class-KL\classK\mathcal{L} function β\beta and a class-K\classK function γ\gamma such that for every initial state and every bounded input,

h(t)    β(h(0),t)  +  γ(sup0τtu(τ)).\norm{\statevec(t)} \;\leq\; \beta\big(\norm{\statevec(0)},\,t\big) \;+\; \gamma\Big(\sup_{0\leq\tau\leq t}\norm{u(\tau)}\Big).

The state is eventually bounded by a gain γ\gamma of the input size, and decays to zero when the input does.

ISS, due to Sontag, makes “small disturbance, small deviation” precise and composable: a cascade of ISS subsystems is ISS, which is what lets large nonlinear designs be built and certified piece by piece Sontag (1998) . Backstepping is the constructive engine that exploits this. For systems in strict-feedback (cascade) form, it builds the controller and a Lyapunov function recursively: stabilize the first subsystem treating the next state as a virtual control, define the error between that state and its desired value, augment the Lyapunov function with a quadratic in the error, and step inward until the real input appears Kellett & Braun (2023) . The output is a controller and a Lyapunov certificate delivered together — Lyapunov design turned into an algorithm, and the classical counterpart of the “value function as a learnable object” that model-based RL will pursue in Part III.

Nonlinear control versus deep RL

Lay the chapter beside the reinforcement-learning half of the curriculum. Both seek a feedback policy and a scalar certificate of good behavior; they differ in what they assume and what they pay.

  • Nonlinear control assumes a model with structure — input-affine form, a known relative degree, matched uncertainty, a physical energy. Given that structure, it returns a controller with a stability proof and no sampling: computed torque, a sliding surface, a backstepping Lyapunov function. The certificate V\lyap is designed, not learned.
  • Deep RL assumes samples, not structure. It learns v\optvaluefn and π\policy from interaction, paying in data and variance, and buys the ability to handle dynamics no one can write down — contact, friction, pixels — where relative degree and clean cancellations do not exist.

The dividing line is whether the structure is available and trustworthy. When it is, control wins on guarantees and sample cost; when the structure is absent or too hard to obtain, sampling wins on reach. Part III is the synthesis: model predictive control (Week 15) turns a model into a policy by online optimization, and the convergence weeks graft learned value functions onto controllers with Lyapunov-style guarantees — RL’s reach with control’s certificates.

What’s next

  • Week 15 (model predictive control). Rather than design one feedback law, re-solve a finite-horizon optimal control problem at every step and apply the first move. MPC is online approximate dynamic programming: Chapter 13’s Riccati value function becomes a horizon-NN optimization, the Lyapunov certificate of this chapter reappears as a terminal cost, and constraints — which neither LQR nor feedback linearization handle — become first-class.

Exercises

  1. (Derive) For the hanging pendulum with energy V=12ϕ˙2+(g/)(1cosϕ)\lyap = \tfrac12\dot\phi^2 + (g/\ell)(1-\cos\phi) and dynamics ϕ¨=(g/)sinϕdϕ˙\ddot\phi = -(g/\ell)\sin\phi - d\dot\phi, compute V˙=Vf\dot{\lyap} = \nabla\lyap^\top f and show the gravity terms cancel, leaving V˙=dϕ˙2\dot{\lyap} = -d\dot\phi^2.

    Solution

    V=((g/)sinϕ, ϕ˙)\nabla\lyap = \big((g/\ell)\sin\phi,\ \dot\phi\big) and f=(ϕ˙, (g/)sinϕdϕ˙)f = \big(\dot\phi,\ -(g/\ell)\sin\phi - d\dot\phi\big). Their inner product is (g/)sinϕϕ˙+ϕ˙((g/)sinϕdϕ˙)=dϕ˙2(g/\ell)\sin\phi\,\dot\phi + \dot\phi(-(g/\ell)\sin\phi - d\dot\phi) = -d\dot\phi^2. The cross terms cancel; only dissipation remains. For d>0d>0 this is negative semidefinite, so Theorem 14.1 gives stability and LaSalle (Theorem 14.2) upgrades it to asymptotic stability.

  2. (Prove) Show that V˙<0\dot{\lyap}<0 for h0\statevec\neq 0 implies the limit of V(h(t))\lyap(\statevec(t)) is a value at which V˙=0\dot{\lyap}=0, and conclude the origin is asymptotically stable.

    Solution

    V(h(t))\lyap(\statevec(t)) is monotonically decreasing and bounded below by 00, hence convergent to some V0\lyap_\infty\geq 0. If V>0\lyap_\infty>0 the trajectory stays in the compact shell {c1VV(h(0))}\{c_1\leq\lyap\leq\lyap(\statevec(0))\} where V˙μ<0\dot{\lyap}\leq -\mu<0, forcing V\lyap\to-\infty — a contradiction. So V=0\lyap_\infty=0, and by positive definiteness h(t)0\statevec(t)\to0. (This is the strict-decrease half of Theorem 14.1; LaSalle handles the semidefinite case.)

  3. (Compute) A system has relative degree r=2r=2: x˙1=x2\dot{x}_1 = x_2, x˙2=sinx1+u\dot{x}_2 = \sin x_1 + u, y=x1y=x_1. Find the feedback-linearizing control that imposes y¨=k1yk2y˙\ddot y = -k_1 y - k_2\dot y and give the closed-loop poles.

    Solution

    y¨=x˙2=sinx1+u\ddot y = \dot x_2 = \sin x_1 + u, so u=sinx1+vu = -\sin x_1 + v with v=k1x1k2x2v = -k_1 x_1 - k_2 x_2 yields y¨=k1yk2y˙\ddot y = -k_1 y - k_2\dot y. The Lie-derivative bookkeeping: Lfh=x2\liederiv_f h = x_2, Lf2h=sinx1\liederiv_f^2 h = \sin x_1, LgLfh=10\liederiv_g\liederiv_f h = 1\neq0 (relative degree 22). Poles are the roots of λ2+k2λ+k1\lambda^2 + k_2\lambda + k_1; pick k1,k2>0k_1,k_2>0 for a Hurwitz pair.

  4. (Prove) For the reaching law s˙=ηsign(s)\dot s = -\eta\,\mathrm{sign}(s), show s(t)\abs{s(t)} hits zero by time s(0)/η\abs{s(0)}/\eta. Why does matched uncertainty not change this bound?

    Solution

    With W=12s2W=\tfrac12 s^2, W˙=ss˙=ηs=η2W\dot W = s\dot s = -\eta\abs{s} = -\eta\sqrt{2W}. Separating, 2W\sqrt{2W} decreases at constant rate η\eta, so s=2W\abs{s}=\sqrt{2W} reaches 00 in time s(0)/η\abs{s(0)}/\eta. A matched disturbance δ\delta enters as s˙=ηsign(s)+δ\dot s = -\eta\,\mathrm{sign}(s) + \delta; choosing η>supδ\eta > \sup\abs{\delta} keeps ss˙(ηsupδ)s<0s\dot s \leq -(\eta-\sup\abs{\delta})\abs{s} < 0, so the surface is still reached — the gain η\eta dominates the uncertainty rather than canceling it.

  5. (Implement) In the companion, verify the three sliding-mode claims: finite-time reaching within s(0)/η\abs{s(0)}/\eta, the surface maintained in the boundary layer afterward, and convergence under a mismatched model.

    Solution

    See experiments/python/week14/test_nonlinear.py: test_sliding_mode_reaches_surface_in_bounded_time checks the reaching bound and boundary-layer maintenance; test_sliding_mode_robust_to_matched_parameter_error builds the controller with a wrong length (so a wrong gravity coefficient) yet still reaches the surface and converges — matched-uncertainty robustness.

  6. (Extend) Argue why the quadratic LQR cost-to-go hPh\statevec^\top\riccati\statevec of Chapter 13 is a Lyapunov function for the optimal closed loop, and use this to connect Lyapunov design to value functions.

    Solution

    From Chapter 13, along the optimal closed loop hkPhkhk+1Phk+1=hk(Q+KRK)hk0\statevec_k^\top\riccati\statevec_k - \statevec_{k+1}^\top\riccati\statevec_{k+1} = \statevec_k^\top(Q + \lqrgain^\top R\lqrgain)\statevec_k \geq 0, so V=hPh\lyap=\statevec^\top\riccati\statevec is positive definite with ΔV0\Delta\lyap\leq 0 — a Lyapunov function that also equals the optimal value. Lyapunov design chooses such a certificate directly (energy, 12s2\tfrac12 s^2, a backstepping sum) instead of solving the optimal-control problem for it — the same object, reached without the full Hamilton–Jacobi–Bellman computation. This is the seam model-based RL works along in Part III.

Companion code

The Week-14 companion lives at experiments/python/week14/ (pure numpy, RK4 integration with the controller evaluated at each stage for a faithful continuous-time closed loop).

  • nonlinear.py — the pendulum model (hanging and upright conventions); the energy Lyapunov function with its analytic and numeric V˙\dot{\lyap}; a certificate check that passes for the damped pendulum and fails for the anti-damped one; the computed-torque feedback-linearizing law and its target linear system; and the boundary-layer sliding-mode law with the sliding surface and reaching time.
  • test_nonlinear.py — mathematical-correctness tests: V˙=dϕ˙2\dot{\lyap}=-d\dot\phi^2 matches the numeric Vf\nabla\lyap^\top f; the certificate discriminates damped from anti-damped; the damped pendulum’s energy is monotone and the state converges (LaSalle); feedback linearization reproduces the target linear system exactly with the prescribed poles; and sliding mode reaches the surface in bounded time, stays in the boundary layer, and is robust to matched parameter error.
# nonlinear-control algorithms + correctness tests
PYTHONPATH=. pytest experiments/python/week14/test_nonlinear.py -q

# worked Lyapunov / feedback-linearization / sliding-mode demonstrations on the pendulum
PYTHONPATH=. python experiments/python/week14/nonlinear.py