Model Predictive Control and Reinforcement Learning

Lectures: Joschka Boedecker and Moritz Diehl

Guest Lectures: Sebastien Gros (NTNU Trondheim) and Sergey Levine (UC Berkeley)

Exercises: Katrin Baumgärtner and Jasper Hoffmann

University of Freiburg, July 26 to August 4, 2021

(online available, all times are Central European Summer Time)

This block course of 8 days duration is intended for master and PhD students from engineering, computer science, mathematics, physics, and other mathematical sciences. The aim is that participants understand the main concepts of model predictive control (MPC) and reinforcement learning (RL) as well the similarities and differences between the two approaches. In hands-on exercises and project work they learn to apply the methods to practical optimal control problems from science and engineering.

The course consists of lectures, exercises, and project work. The lectures in the first week will be given by Prof. Dr. Joschka Boedecker and Prof. Dr. Moritz Diehl from the University of Freiburg. In the second week of the course, invited guest lectures will be given by Prof. Dr. Sebastien Gros from NTNU Trondheim (Norway) and Prof. Dr. Sergey Levine from UC Berkeley (California, US).

Topics include

Optimal Control Problem (OCP) formulations - constrained, infinite horizon, discrete time, stochastic, robust
Markov Decision Processes (MDP)
From continuous to discrete: discretization in space and time
Dynamic Programming (DP) concepts and algorithms - value iteration and policy iteration
Linear Quadratic Regulator (LQR) and Riccati equations
Convexity considerations in DP for constrained linear systems
Model predictive control (MPC) formulations and stability guarantees
MPC algorithms - quadratic programming, direct multiple shooting, Gauss-Newton, real-time iterations
Differential Dynamic Programming (DDP) for the solution of unconstrained MPC problems
Reinforcement Learning (RL) formulations and approaches
Model-free RL: Monte Carlo, temporal differences, model learning, direct policy search
RL with function approximation
Model-based RL and combinations of model-based and model-free methods
Similarities and differences between MPC and RL

The course will be conducted as a mixed virtual and real event if the Corona situation in July 2021 allows, and otherwise be fully virtual. All lecture videos and exercises will be made openly available. Each course day starts at 9:00 and ends at 17:00 (5 p.m.) Freiburg time (MET). Lectures are typically followed by computer exercises in Python. A mandatory requirement for officially passing the course is successful participation in an online microexam on Friday July 30, 2021, at 9:00. In the second week, on August 2-4, 2021, participants will work on application projects which apply at least one of the MPC and RL methods to self-chosen application problems from any area of science or engineering. The results of the projects, that can be performed in teams of either one or two people (preferred), will be presented in a public poster presentation on the last day of the course, and a short report to be submitted two weeks after the course. The report will determine the final grade of the course.

This course has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 953348.

The course can be followed in different levels of participation:

Participation Level	Passing Requirements	Certificate	Max. No of Participants
Level A - Online Listening	-	-	unlimited
Level B - First Week with Exam	exercises and microexam on July 30	"Certificate B", without grade	120
Level C - Full Time with Project	Level B requirements plus project presentation on Aug. 4	"Certificate C", without grade	90
Level D - Full Time with Report	Level C requirements plus report on Aug. 18	"Certificate D", with or without grade, 3 ECTS.	60

The maximum number of physical participants in Freiburg is 24. For Level D participation, priority will be given to students of the University of Freiburg.

Registration to the course is closed!

Any questions on the course can be addressed to Katrin Baumgärtner (katrin.baumgaertner@imtek.uni-freiburg.de).

Certificates:

If you have participated in the course and you want to get a certificate, please fill out the form until the 20th of August.

Microexam:

Form to fill in the answers

Exam

Projects: guidelines, teams, slides

Please upload your project reports here until the 18th of August.

Preparation for the exercises:

For the exercises you need:

An editor that can handle Jupyter Notebooks.
A way to install python packages like pip/pip3. The packages are listed in the requirements.txt.

To achieve this, it is helpful to make yourself first familiar with how to install python packages. After that, download the requirements file linked above and install the packages as described here. This should also install Jupyter Lab, which is an editor that lives in your browser. To start Jupyter Lab type jupyter lab in your terminal. Further instructions for Jupyter Lab can be found here.

Optional: There is an optional exercise using the open-source software acados for high-performance embedded NMPC, which is developed within the group of Prof. Diehl. If you would like to try it, please follow the installation instructions here.

Exercises:

Monday: exercise01 (solution), exercise02 (solution)

Tuesday: exercise03 (solution), exercise04 (solution)

Wednesday: exercise05 (solution), exercise06 (solution)

Thursday: exercise07 (solution), exercise08 (solution)

Lecture location:

HS 1015, Kollegiengebäude I, Platz der Universität 3 , D-79098 Freiburg, Germany

At the lecture location, there will be basic catering like coffee, water and fruits.

All Lectures and Exercise Sessions are broadcasted via Zoom:

Join Zoom Meeting
https://uni-freiburg.zoom.us/j/65439158240?pwd=OHJrb2p2U2xDUEpVQTRBWWlvZGo0UT09

Meeting ID: 654 3915 8240
Passcode: 0uu1tq0ek

Get-to-know-each-other session:

There will be a get to know each other session on the first Monday. For the participants that are taking part via Zoom, we will meet in wonder.me under the following link:

https://www.wonder.me/r?id=20f5f8da-411c-4c79-8560-05fbe4475ebf

For everyone else, we will just meet in the lecture room.

Schedule

Please note that the lecture recordings might not show a video if you open them in your browser. Downloading the lecture recordings and watching them locally should be working fine.

Monday	Lecture 1 - Introduction - Joschka Boedecker and Moritz Diehl	Video
	Lecture 2 - Dynamic Systems and Simulation (Moritz Diehl)	Video
	Exercise 1 - Dynamic System Simulation - Katrin Baumgärtner und Jasper Hoffmann
	Lecture 3 - Numerical Optimization (Moritz Diehl)	Video
	Exercise 2 - Numerical Optimization - Katrin Baumgärtner
Tuesday	Lecture 4 - Dynamic Programming and LQR - Moritz Diehl	Video
	Exercise 3 - Dynamic Programming and LQR - Katrin Baumgärtner
	Lecture 5 - MDPs, PI and VI - Joschka Boedecker	Video
	Lecture 6 - Monte Carlo RL, Temporal Difference and Q-Learning - Joschka Boedecker
	Exercise 4 - Q-Learning - Jasper Hoffmann
Wednesday	Lecture 7 - Numerical Optimal Control - Moritz Diehl Slides1, Slides2, Blackboard, Slides3	Video, Video
	Exercise 5 - Numerical Optimal Control - Katrin Baumgärtner
	Lecture 8 - MPC Stability Theory - Moritz Diehl (Blackboard)	Video
	Lecture 9 - MPC Algorithms - Moritz Diehl - cancelled -
	Exercise 6 - Model Predictive Control - Katrin Baumgärtner
Thursday	Lecture 10 - Onpolicy RL with Function Approximation - Joschka Boedecker	Video
	Lecture 11 - Offpolicy RL with Function Approximation - Joschka Boedecker	Video
	Exercise 7 - RL with Function Approximators - Jasper Hoffmann
	Lecture 12 - Policy Gradient Methods - Joschka Boedecker	Video
	Lecture 13 - Advanced Value-based Methods - Joschka Boedecker	Video
Friday	Lecture 14 - Recent Algorithms for Nonlinear and Robust MPC - Moritz Diehl. Slides: PartA1, PartA2, PartB,	VideoPartA, VideoPartC Paper PDFs: PA1, PA2, PA3, PA4
	Lecture 15 - Planning and Learning - Joschka Boedecker	Video
	Lecture 16 - Differences and Similarities of MPC and RL - Joschka Boedecker and Moritz Diehl	Video
	Extra: LP formulation of DP - Moritz Diehl	Video
Monday	Guest Lecture Sebastien Gros: Adaptation of MPC via RL: fundamental principles	Video
	Guest Lecture Sebastien Gros: RL and MPC: safety, stability, and some more recent results	Video
	Guest Lecture - Sergey Levine: Model-Free and Model-Based Reinforcement Learning from Offline Data

Time Slots	Monday	Tuesday	Wednesday	Thursday	Friday
09:00-09:45	Lecture 1 - Introduction - Joschka Boedecker and Moritz Diehl Video	Lecture 4 - Dynamic Programming and LQR - Moritz Diehl Video	Lecture 7 - Numerical Optimal Control - Moritz Diehl Slides1, Slides2 Blackboard Video	Lecture 10 - Onpolicy RL with Function Approximation - Joschka Boedecker Video	Microexam
10:00-10:45	Lecture 2: Dynamic Systems and Simulation (Moritz Diehl) Video	Lecture 4 (continued) - Moritz Diehl	Lecture 7 (continued) - Moritz Diehl Slides3 Video	Lecture 11 - Offpolicy RL with Function Approximation - Joschka Boedecker Video	Lecture 14 - Recent Algorithms for Nonlinear and Robust MPC - Moritz Diehl. Slides: PartA1, PartA2, PartB, VideoPartA VideoPartC Paper PDFs: PA1, PA2, PA3, PA4
11:15-12:00	Exercise 1 - Dynamic System Simulation - Katrin Baumgärtner und Jasper Hoffmann	Exercise 3 - Dynamic Programming and LQR - Katrin Baumgärtner	Exercise 5 - Numerical Optimal Control - Katrin Baumgärtner	Exercise 7 - RL with Function Approximators - Jasper Hoffmann	Project Guidelines
14:00-14:45	Lecture 3: Numerical Optimization (Moritz Diehl) Video	Lecture 5 - MDPs, PI and VI - Joschka Boedecker Video	Lecture 8 - MPC Stability Theory - Moritz Diehl Blackboard Video	Lecture 12 - Policy Gradient Methods - Joschka Boedecker Video	Lecture 15 - Planning and Learning - Joschka Boedecker Video
15:00-15:45	Extended Coffee Break / Get-to-Know Each-Other-Session	Lecture 6 - Monte Carlo RL, Temporal Difference and Q-Learning - Joschka Boedecker Video	Lecture 9 - MPC Algorithms - Moritz Diehl - cancelled -	Lecture 13 - Advanced Value-based Methods - Joschka Boedecker Video	Lecture 16 - Differences and Similarities of MPC and RL - Joschka Boedecker and Moritz Diehl Video DP as LP Video
16:15-17:00	Exercise 2 - Numerical Optimization - Katrin Baumgärtner	Exercise 4 - Q-Learning - Jasper Hoffmann	Exercise 6 - Model Predictive Control - Katrin Baumgärtner	Exercise 8 - Policy Gradient - Jasper Hoffmann	Project Pitch Presentations

Time Slots	Monday	Tuesday	Wednesday
09:00-09:45	Guest Lecture Sebastien Gros: Adaptation of MPC via RL: fundamental principles Video Slides	Project Work	Project Work
10:00-10:45	Project Commitments	Project Status Updates	Project Presentations slides
11:15-12:00	Project Work Question session: Jasper Hoffmann & Katrin Baumgärtner	Project Work Question session: Jasper Hoffmann & Katrin Baumgärtner	Project Presentations
14:00-14:45	Guest Lecture Sebastien Gros: RL and MPC: safety, stability, and some more recent results Video Slides	Project Work	Project Presentations
15:00-15:45	Project Work	Project Work Question Session: Jasper Hoffmann & Katrin Baumgärtner	Certificate Handout (A,B,C)
16:15-17:00	Project Work Question Session: Jasper & Katrin Baumgärtner	Guest Lecture - Sergey Levine: Model-Free and Model-Based Reinforcement Learning from Offline Data , Start moved to 17:15 Slides	End at 16:00

Systems Control and Optimization Laboratory

IMTEK, Faculty of Engineering, University of Freiburg

Model Predictive Control and Reinforcement Learning

Lectures: Joschka Boedecker and Moritz Diehl

Guest Lectures: Sebastien Gros (NTNU Trondheim) and Sergey Levine (UC Berkeley)

Exercises: Katrin Baumgärtner and Jasper Hoffmann