References

Every paper, book, and software citation in this book, sorted alphabetically by first-author surname. Click an entry’s anchor to share a deep link, or follow the arXiv / DOI / URL for the source.

[astrom2021feedback] Karl Johan \AAström and Richard M. Murray (2021). Feedback Systems: An Introduction for Scientists and Engineers. Princeton University Press. link.
[akkaya2019rubik] Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, and others (2019). Solving Rubik's Cube with a Robot Hand. arXiv preprint arXiv:1910.07113. link.
[azar2013sample] Mohammad Gheshlaghi Azar, Rémi Munos, and Hilbert J. Kappen (2013). On the Sample Complexity of Reinforcement Learning with a Generative Model. Machine Learning. Vol. 91, no. 3. pp. 325-349. link.
[baird1995residual] Leemon Baird (1995). Residual Algorithms: Reinforcement Learning with Function Approximation. Machine Learning Proceedings 1995. pp. 30-37. link.
[barto1995rtdp] Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh (1995). Learning to Act Using Real-Time Dynamic Programming. Artificial Intelligence. Vol. 72, no. 1–2. pp. 81-138. link.
[bellemare2013ale] Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling (2013). The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research. Vol. 47. pp. 253-279. link.
[bellman1957dp] Richard E. Bellman (1957). Dynamic Programming. Princeton University Press. link.
[bertsekas2017dpvol1] Dimitri P. Bertsekas (2017). Dynamic Programming and Optimal Control, Vol. I. Athena Scientific. link.
[degrave2022tokamak] Jonas Degrave, Federico Felici, Jonas Buchli, and others (2022). Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning. Nature. Vol. 602, no. 7897. pp. 414-419. link.
[doyle1978lqg] John C. Doyle (1978). Guaranteed Margins for LQG Regulators. IEEE Transactions on Automatic Control. Vol. 23, no. 4. pp. 756-757. link.
[engstrom2020implementation] Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, and Aleksander Madry (2020). Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO. International Conference on Learning Representations (ICLR). link.
[fujimoto2018td3] Scott Fujimoto, Herke Hoof, and David Meger (2018). Addressing Function Approximation Error in Actor-Critic Methods. International Conference on Machine Learning (ICML). link.
[haarnoja2018sac] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. International Conference on Machine Learning (ICML). link.
[haarnoja2019sacapps] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine (2019). Soft Actor-Critic Algorithms and Applications. arXiv preprint arXiv:1812.05905. link.
[vanhasselt2016double] Hado Hasselt, Arthur Guez, and David Silver (2016). Deep Reinforcement Learning with Double Q-Learning. AAAI Conference on Artificial Intelligence. link.
[henderson2018deeprl] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger (2018). Deep Reinforcement Learning that Matters. AAAI Conference on Artificial Intelligence. link.
[hessel2018rainbow] Matteo Hessel, Joseph Modayil, Hado Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver (2018). Rainbow: Combining Improvements in Deep Reinforcement Learning. AAAI Conference on Artificial Intelligence. link.
[howard1960dp] Ronald A. Howard (1960). Dynamic Programming and Markov Processes. MIT Press. link.
[kalman1961control] Rudolf E. Kalman (1960). Contributions to the Theory of Optimal Control. Boletı́n de la Sociedad Matemática Mexicana. Vol. 5. pp. 102-119. link.
[kalman1960new] Rudolf E. Kalman (1960). A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering. Vol. 82, no. 1. pp. 35-45. link.
[kaufmann2023drone] Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias Müller, Vladlen Koltun, and Davide Scaramuzza (2023). Champion-Level Drone Racing Using Deep Reinforcement Learning. Nature. Vol. 620, no. 7976. pp. 982-987. link.
[kellett2023nonlinear] Christopher M. Kellett and Philipp Braun (2023). Introduction to Nonlinear Control: Stability, Control Design, and Estimation. Princeton University Press. link.
[khalil2002nonlinear] Hassan K. Khalil (2002). Nonlinear Systems. Prentice Hall. link.
[konda2000actor] Vijay R. Konda and John N. Tsitsiklis (2000). Actor-Critic Algorithms. Advances in Neural Information Processing Systems (NIPS). link.
[lee2020quadruped] Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter (2020). Learning Quadrupedal Locomotion over Challenging Terrain. Science Robotics. Vol. 5, no. 47. link.
[lewis2012optimal] Frank L. Lewis, Draguna Vrabie, and Vassilis L. Syrmos (2012). Optimal Control. John Wiley & Sons. link.
[lillicrap2015ddpg] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra (2016). Continuous Control with Deep Reinforcement Learning. International Conference on Learning Representations (ICLR). link.
[miki2022locomotion] Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter (2022). Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild. Science Robotics. Vol. 7, no. 62. link.
[mnih2013playing] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller (2013). Playing Atari with Deep Reinforcement Learning. NIPS Deep Learning Workshop. link.
[mnih2015human] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, and others (2015). Human-level Control through Deep Reinforcement Learning. Nature. Vol. 518, no. 7540. pp. 529-533. link.
[moore1993prioritized] Andrew W. Moore and Christopher G. Atkeson (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time. Machine Learning. Vol. 13, no. 1. pp. 103-130. link.
[puterman1994mdp] Martin L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience. link.
[radosavovic2024humanoid] Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, and Koushil Sreenath (2024). Real-World Humanoid Locomotion with Reinforcement Learning. Science Robotics. Vol. 9, no. 89. link.
[raffin2021sb3] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research. Vol. 22, no. 268. pp. 1-8. link.
[sastry1999nonlinear] Shankar Sastry (1999). Nonlinear Systems: Analysis, Stability, and Control. Springer. Vol. 10. link.
[schaul2016prioritized] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver (2016). Prioritized Experience Replay. International Conference on Learning Representations (ICLR). link.
[schulman2015trpo] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel (2015). Trust Region Policy Optimization. International Conference on Machine Learning (ICML). link.
[schulman2016gae] John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel (2016). High-Dimensional Continuous Control Using Generalized Advantage Estimation. International Conference on Learning Representations (ICLR). link.
[schulman2017ppo] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347. link.
[sidford2018near] Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, and Yinyu Ye (2018). Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Processes with a Generative Model. Advances in Neural Information Processing Systems (NeurIPS). link.
[silver2014dpg] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller (2014). Deterministic Policy Gradient Algorithms. International Conference on Machine Learning (ICML). link.
[slotine1991applied] Jean-Jacques E. Slotine and Weiping Li (1991). Applied Nonlinear Control. Prentice Hall. link.
[sontag1998mathematical] Eduardo D. Sontag (1998). Mathematical Control Theory: Deterministic Finite Dimensional Systems. Springer. Vol. 6. link.
[sutton2000policy] Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems (NIPS). link.
[sutton2018rl] Richard S. Sutton and Andrew G. Barto (2018). Reinforcement Learning: An Introduction. MIT Press. link.
[tang2025survey] Chen Tang, Pieter Abbeel, and others (2025). Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes. arXiv preprint arXiv:2408.03539. link.
[tseng1990horizon] Paul Tseng (1990). Solving H-horizon, Stationary Markov Decision Problems in Time Proportional to log(H). Operations Research Letters. Vol. 9, no. 5. pp. 287-297. link.
[tsitsiklis1997analysis] John N. Tsitsiklis and Benjamin Van Roy (1997). An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control. Vol. 42, no. 5. pp. 674-690. link.
[wang2016dueling] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas (2016). Dueling Network Architectures for Deep Reinforcement Learning. International Conference on Machine Learning (ICML). link.
[watkins1992qlearning] Christopher J. C. H. Watkins and Peter Dayan (1992). Q-learning. Machine Learning. Vol. 8, no. 3–4. pp. 279-292. link.
[williams1992simple] Ronald J. Williams (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning. Vol. 8, no. 3–4. pp. 229-256. link.