forked from RussTedrake/underactuated
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtrajopt.html
1243 lines (1023 loc) · 69.5 KB
/
trajopt.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<title>Underactuated Robotics: Trajectory
Optimization</title>
<meta name="Underactuated Robotics: Trajectory
Optimization" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://underactuated.mit.edu/trajopt.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('underactuated');">
<div data-type="titlepage">
<header>
<h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
<p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2021<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p><b>Note:</b> These are working notes used for <a
href="http://underactuated.csail.mit.edu/Spring2021/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2021 semester. <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture videos are available on YouTube</a>.</p>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=lyapunov.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=planning.html>Next Chapter</a></td>
</tr></table>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 9"><h1>Trajectory
Optimization</h1>
<p>I've argued that optimal control is a powerful framework for specifying
complex behaviors with simple objective functions, letting the dynamics and
constraints on the system shape the resulting feedback controller (and vice
versa!). But the computational tools that we've provided so far have been
limited in some important ways. The numerical approaches to dynamic
programming which involve putting a mesh over the state space do not scale
well to systems with state dimension more than four or five. Linearization
around a nominal operating point (or trajectory) allowed us to solve for
locally optimal control policies (e.g. using LQR) for even very
high-dimensional systems, but the effectiveness of the resulting controllers
is limited to the region of state space where the linearization is a good
approximation of the nonlinear dynamics. The computational tools for Lyapunov
analysis from the last chapter can provide, among other things, an effective
way to compute estimates of those regions. But we have not yet provided any
real computational tools for approximate optimal control that work for
high-dimensional systems beyond the linearization around a goal. That is
precisely the goal for this chapter.</p>
<p>The big change that is going to allow us to scale to high-dimensional
systems is that we are going to give up the goal of solving for the optimal
feedback controller for the entire state space, and instead attempt to find
an optimal control solution that is valid from only a single initial
condition. Instead of representing this as a feedback control function, we
can represent this solution as a <em>trajectory</em>, $\bx(t), \bu(t)$,
typically defined over a finite interval.</p>
<section><h1>Problem Formulation</h1>
<p>Given an initial condition, $\bx_0$, and an input trajectory $\bu(t)$
defined over a finite interval, $t\in[t_0,t_f]$, we can compute the
long-term (finite-horizon) cost of executing that trajectory using the
standard additive-cost optimal control objective, \[ J_{\bu(\cdot)}(\bx_0) =
\ell_f (\bx(t_f)) + \int_{t_0}^{t_f} \ell(\bx(t),\bu(t)) dt. \] We will
write the trajectory optimization problem as \begin{align*}
\min_{\bu(\cdot)} \quad & \ell_f (\bx(t_f)) + \int_{t_0}^{t_f}
\ell(\bx(t),\bu(t)) dt \\ \subjto \quad & \dot{\bx}(t) = f(\bx(t),\bu(t)),
\quad \forall t\in[t_0, t_f] \\ & \bx(t_0) = \bx_0. \\ \end{align*} Some
trajectory optimization problems may also include additional constraints,
such as collision avoidance (e.g., where the constraint is that the signed
distance between the robot's geometry and the obstacles stays positive) or
input limits (e.g. $\bu_{min} \le \bu \le \bu_{max}$ ), which can be defined
for all time or some subset of the trajectory.</p>
<p> As written, the optimization above is an optimization over continuous
trajectories. In order to formulate this as a numerical optimization, we
must parameterize it with a finite set of numbers. Perhaps not
surprisingly, there are many different ways to write down this
parameterization, with a variety of different properties in terms of speed,
robustness, and accuracy of the results. We will outline just a few of the
most popular below; I would recommend <elib>Betts98+Betts01</elib> for
additional details. In our graph-search dynamic programming algorithms, we
discretized the dynamics of the system on a mesh spread across the state
space; in addition to not scaling well, we also found it difficult to bound
the errors due to the discretization. Now we are in the space of
discretizing/parameterizing over time; conveniently we know relatively much
more about the numerics of integration along a trajectory.</p>
</section>
<section><h1>Convex Formulations for Linear Systems</h1>
<p>Let us first consider the case of linear systems. In fact, if we start
in discrete time, we can even defer the question of how to best discretize
the continuous-time problem. There are a few different ways that we might
"transcribe" this optimization problem into a concrete mathematical
program.</p>
<subsection><h1>Direct Transcription</h1>
<p>For instance, let us start by writing both $\bu[\cdot]$ and
$\bx[\cdot]$ as decision variables. Then we can write: \begin{align*}
\min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n_0}^{N-1}
\ell(\bx[n],\bu[n]) \\ \subjto \quad & \bx[n+1] = {\bf A}\bx[n] + {\bf
B}\bu[n], \quad \forall n\in[0, N-1] \\ & \bx[0] = \bx_0 \\ & +
\text{additional constraints}. \end{align*} We call this modeling choice
-- adding $\bx[\cdot]$ as decision variables and modeling the discrete
dynamics as explicit constraints -- the "<i>direct transcription</i>".
Importantly, for linear systems, the dynamics constraints are linear
constraints in these decision variables. As a result, if we can restrict
our additional constraints to linear inequality constraints and our
objective function to being linear/quadratic in $\bx$ and $\bu$, then the
resulting trajectory optimization is a convex optimization (specifically a
linear program or quadratic program depending on the objective). As a
result, we can reliably solve these problems to global optimality at quite
large scale, potentially even online -- consider the case where the
constraints, like staying in a lane, are not known until runtime.</p>
<example><h1>Trajectory optimization for the Double Integrator</h1>
<p>We've looked at a few optimal control problems for the double
integrator using value iteration. For one of them -- the quadratic
objective with no constraints on $\bu$ -- we know now that we could have
solved the problem "exactly" using LQR. But the minimum-time problem
and even the LQR problem with constraints do not yet have more
satisyfing solutions than discretizing state space, which causes
potentially large numerical artifacts.</p>
<p>In the trajectory formulation, we can solve these problems exactly
for the discrete-time double integrator, and with better accuracy for
the continuous-time double integrator. Take a moment to appreciate
that! The bang-bang policy and cost-to-go functions are fairly
nontrivial functions of state; it's quite satisfying that we can
evaluate them using convex optimization! The limitation, of course, is
that we are only solving them for one initial condition at a time.</p>
<jupyter>examples/trajectory_optimization.ipynb</jupyter>
</example>
<p>If you have not studied linear programming before, you might be
surprised by the modeling power of even linear objectives within this
framework. Consider, for instance, that we want to drive a system to the
origin. It turns out that taking $$\ell(\bx,\bu) = |\bx| + |\bu|,$$ the
direct transcription can still be formulated as a linear program. To do
it, add additional decision variables ${\bf s}_x[\cdot]$ and ${\bf
s}_u[\cdot]$ -- these are commonly referred to as <i>slack variables</i>
-- and write $$\min_{\bx,\bu,{\bf s}_x,{\bf s}_u} \sum_n^{N-1} {\bf
s}_x[n] + {\bf s}_u[n], \quad \text{s.t.} \quad {\bf s}_x[n] \ge x[n],
\quad {\bf s}_x[n] \ge -x[n], \quad ...$$ The field of convex
optimization is replete with tricks like this. Knowing and recognizing
them are skills of the (optimization) trade. But there are many relevant
constraints which cannot be recast into convex constraints with any amount
of skill -- going either left or right around an obstacle is one of the
most important examples. These represent a fundamentally non-convex
constraints in $\bx$; we'll discuss the implications of using non-convex
optimization for trajectory optimization below.</p>
</subsection>
<subsection id="direct_shooting"><h1>Direct Shooting</h1>
<p>The savvy reader might have noticed that adding $\bx[\cdot]$ as
decision variables was not strictly necessary. If we know $\bx[0]$ and we
know $\bu[\cdot]$, then we should be able to solve for $\bx[n]$ -- that's
just forward simulation. Indeed, by satisfying the constraints in direct
transcription, the solver is effectively solving the difference equation
(acausally). For our discrete-time linear systems, this is particularly
easy and nice: \begin{align*}\bx[1] =& {\bf A}\bx[0] + {\bf B}\bu[0] \\
\bx[2] =& {\bf A}({\bf A}\bx[0] + {\bf B}\bu[0]) + {\bf B}\bu[1] \\ \bx[n]
=& {\bf A}^n\bx[0] + \sum_{k=0}^{n-1} {\bf A}^{n-1-k}{\bf
B}u[k].\end{align*} What's more, the solution is still linear in
$\bu[\cdot]$. This is amazing... we can get rid of a bunch of decision
variables, and turn a constrained optimization problem into an
unconstrained optimization problem (assuming we don't have any other
constraints). This approach -- using $\bu[\cdot]$ but <i>not</i>
$\bx[\cdot]$ as decision variables and evaluating $\ell(\bx[n],\bu[n])$ by
effectively forward simulating to obtain $\bx[n]$ -- is called the
<i>direct shooting</i> transcription. For linear systems with
linear/quadratic objectives in $\bx$, and $\bu$, it is still a convex
optimization, and has less decision variables and constraints than the
direct transcription.</p>
</subsection>
<subsection id="computational_considerations"><h1>Computational Considerations</h1></subsection>
<p>So is direct shooting uniformly better than the direct transcription
approach? I think it is not. There are a few potential reason that one
might prefer the direct transcription: <ul><li>Numerical conditioning.
While it is fundamental to the trajectory optimization problem that
control input $\bu[0]$ has more opportunity to have a large impact on the
total cost than control input $\bu[N-1]$, the direct transcription
approach spreads this out over a large number of well-balanced
constraints. Shooting involves calculating ${\bf A}^n$ for potentially
large $n$.</li><li>Adding state constraints. Having $\bx[n]$ as explicit
decision variables makes it very easy/natural to add additional state
constraints; and the solver effectively reuses the computation of ${\bf
A}^n$ for each constraint. In shooting, one has to unroll those terms for
each new constraint.</li><li>Parallelization. For larger problems,
evaluating the constraints can be a substantial cost. In direct
transcription, one can evaluate the dynamics/constraints in parallel
(because each iteration begins with $\bx[n]$ already given), whereas
shooting is more fundamentally a serial operation.</li></ul></p>
<p>For linear convex problems, the solvers are mature enough that these
differences often don't amount to much. For nonlinear optimization
problems, the differences can be substantial. And both approaches have
their loyal camps.</p>
<p>It is also worth noting that the problems generated by the direct
transcription have an important and exploitable "banded" sparsity pattern
-- most of the constraints touch only a small number of variables -- it's
actually the same pattern that we exploit in the Riccati equations.
Thanks to the importance of these methods in real applications, numerous
specialized solvers have been written to explicitly exploit this sparsity
(e.g. <elib>Wang09a</elib>).</p>
</subsection>
<subsection><h1>Continuous Time</h1>
<p>If we wish to solve the continuous-time version of the problem, then we
can discretize time and use the formulations above. The most important
decision is the discretization / numerical integration scheme. For linear
systems, if we assume that the control inputs are held constant for each
time step (aka <a
href="https://en.wikipedia.org/wiki/Zero-order_hold">zero-order hold</a>),
then we can integrate the dynamics perfectly: $$\bx[n+1] = \bx[n] +
\int_{t_n}^{t_n + h} \left[ {\bf A} \bx(t) + {\bf B}\bu \right]dt =
e^{{\bf A}h}\bx[n] + {\bf A}^{-1}(e^{{\bf A}h} - {\bf I}){\bf B}\bu[n],$$
is the simple case (when ${\bf A}$ is invertible). But in general, we can
use any finitely-parameterized representation of $\bu(t)$ and any
numerical integration scheme to obtain $\bx[n+1]={\bf f}(\bx[n],
\bu[n])$.</p>
</subsection>
</section>
<section><h1>Nonconvex Trajectory Optimization</h1>
<p>I humbly recommend that you study the convex trajectory optimization case
for clarity and sense of purpose, in practice trajectory optimization is
often used to solve nonconvex problems. Our formulation can become
non-convex for a number of reasons -- if the dynamics are nonlinear then the
dynamic constraints are non-convex. You may also wish to have a nonconvex
objective or nonconvex additional constraint (e.g. non-collision), even if
your dynamics are linear. Typically we formulate these problems using tools
from <a href="optimization.html#nonlinear">nonlinear programming</a>.</p>
<subsection><h1>Direct Transcription and Direct Shooting</h1>
<p>The formulations that we wrote for direct transcription and direct
shooting above are still valid when the dynamics are nonlinear, it's just
that the resulting problem is nonconvex. For instance, the direct
transcription for discrete-time systems becomes the more general:
\begin{align*} \min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) +
\sum_{n_0}^{N-1} \ell(\bx[n],\bu[n]) \\ \subjto \quad & \bx[n+1] = {\bf
f}(\bx[n], \bu[n]), \quad \forall n\in[0, N-1] \\ & \bx[0] = \bx_0 \\ & +
\text{additional constraints}. \end{align*} Direct shooting still works,
too, since on each iteration of the algorithm we can compute $\bx[n]$
given $\bx[0]$ and $\bu[\cdot]$ by forward simulation. But things get a
bit more interesting when we consider continuous-time systems.</p>
<p>For nonlinear dynamics, there is no one obvious choice for how realize
the discrete dynamics $$\bx[n+1] = \bx[n] + \int_{t[n]}^{t[n+1]} f(\bx(t),
\bu(t)) dt, \quad \bx(t[n]) = \bx[n].$$ For instance, in <drake></drake>
we have an entire <a
href="https://drake.mit.edu/doxygen_cxx/group__integrators.html">suite of
numerical integrators</a> that achieve different levels of simulation
speed and/or accuracy, both of which can be highly dependent on the
details of ${\bf f}(\bx,\bu)$. Choosing a different integrator can also
have subtle effects on the resulting optimization problem -- for instance
in optimization we typically avoid variable-step integration because
gradients can be discontinuous based on the integrators decision to take a
step (or not).</p>
<todo>Finish supporting IntegratorBase in DirectTranscription and write an
example showing how to use it here.</todo>
<p>But once we have opened the door to nonlinear optimization approaches,
then a new trick becomes available. In the convex formulation, we chose
the time discretization apriori, and left it fixed for the duration of the
optimization. That is because time enters in a nonlinear way into the
constraints. But, as I've hinted with my notation above, we can now
imagine adding $t[\cdot]$ to the problem as decision variables, allowing
the optimizer to stretch or shrink the time intervals in order to solve
the problem. Adding some constraints to these time variables is essential
in order to avoid trivial solutions (like collapsing to a trajectory of
zero duration); one could potentially even add more advanced constraints
to control the integration error.</p>
</subsection> <!-- end dirtran -->
<subsection id="direct_collocation"><h1>Direct Collocation</h1>
<p>It is very satisfying to have a suite of numerical integration routines
available for our direct transcription. But numerical integrators are
designed to solve forward in time, and this represents a design constraint
that we don't actually have in our direct transcription formulation. If
our goal is to obtain an accurate solution to the differential equation
with a small number of function evaluations / decision variables /
constraints, then some new formulations are possible that take advantage
of the constrained optimization formulation. These include the the
so-called <em>collocation methods</em>.</p>
<p> In direct collocation (c.f., <elib>Hargraves87</elib>), both the
input trajectory and the state trajectory are represented explicitly as
piecewise polynomial functions. In particular, the sweet spot for this
algorithm is taking $\bu(t)$ to be a first-order polynomial and $\bx(t)$
to be a cubic polynomial.</p>
<p>It turns out that in this sweet spot, the only decision variables we
need in our optimization are the values $\bu(t)$ and $\bx(t)$ at the
break points of the spline. You might think that you would need the
coefficients of the cubic spline parameters, but you do not. For the
first order interpolation on $\bu(t)$, given $\bu(t_k)$ and
$\bu(t_{k+1})$, we can solve for every value $\bu(t)$ over the interval
$t \in [k, k+1]$. But we also have everything that we need for the cubic
spline: given $\bx(t_k)$ and $\bu(t_k)$, we can compute $\dot\bx(t_k) = f
(\bx(t_k), \bu(t_k))$; and the four values $\bx(t_k), \bx(t_{k+1}),
\dot\bx (t_k), \dot\bx(t_{k+1})$ completely define all of the parameters
of the cubic spline over the interval $t\in[t_k, t_{k+1}]$. This is very
convenient, because it is easy for us to add additional constraints to
$\bu$ and $\bx$ at the knot points (and would have been relatively harder
to have to convert every constraint into constraints on the spline
coefficients).</p>
<figure>
<img width="80%" src="figures/dircol.svg">
<figcaption>Cubic spline parameters used in the direct collocation method.</figcaption>
</figure>
<p>So far, given $\dot{\bx} = f(\bx,\bu)$, a list of break points $t_0, .
t_N$, and arbitrary values $\bx(t_k), \bu(t_k)$, we can extract the
spline representation of $\bu(t)$ and $\bx(t)$. So far we haven't added
any actual constraints to our mathematical program. Now, just like in
direct transcription, we need a constraint that will tell the optimizer
how $\bx(t_{k+1})$ is related to $\bx(t_k)$, etc. In direct collocation,
the constraint that we add is that the derivative of the spline at the
<i>collocation points</i> matches the dynamics. In particular, for our
sweet spot, if we choose the collocation points to be the midpoints of the
spline, then we have that \begin{gather*} t_{c,k} = \frac{1}{2}\left(t_k +
t_{k+1}\right), \qquad h_k = t_{k+1} - t_k, \\ \bu(t_{c,k}) =
\frac{1}{2}\left(\bu(t_k) + \bu(t_{k+1})\right), \\ \bx(t_{c,k}) =
\frac{1}{2}\left(\bx(t_k) + \bx(t_{k+1})\right) + \frac{h}{8}
\left(\dot\bx(t_k) - \dot\bx(t_{k+1})\right), \\ \dot\bx(t_{c,k}) =
-\frac{3}{2h}\left(\bx(t_k) - \bx(t_{k+1})\right) - \frac{1}{4}
\left(\dot\bx(t_k) + \dot\bx(t_{k+1})\right). \end{gather*} These
equations come directly from the equations that fit the cubic spline to
the end points/derivatives then interpolate them at the midpoint. They
give us precisely what we need to add the dynamics constraint to our
optimization at the collocation points:\begin{align*}
\min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n_0}^{N-1} h_n
\ell(\bx[n],\bu[n]) \\ \subjto \quad & \dot\bx(t_{c,n}) = f(\bx(t_{c,n}),
\bu(t_{c,n})), & \forall n \in [0,N-1] \\ & \bx[0] = \bx_0 \\ & +
\text{additional constraints}. \end{align*} I hope this notation is clear
-- I'm using $\bx[k] = \bx(t_k)$ as the decision variables, and the
collocation constraint at $t_{c,k}$ depends on the decision variables:
$\bx[k], \bx[k+1], \bu[k], \bu[k+1]$. The actual equations of motion get
evaluated at both the break points, $t_k$, and the collocation points,
$t_{c,k}$.</p>
<p>Once again, direct collocation effectively integrates the equations of
motion by satisfying the constraints of the optimization -- this time
producing an integration of the dynamics that is accurate to third-order
with effectively two evaluations of the plant dynamics per segment (since
we use $\dot\bx(t_k)$ for two intervals). <elib> Hargraves87</elib>
claims, without proof, that as the break points are brought closer
together, the trajectory will converge to a true solution of the
differential equation. Once again it is very natural to add additional
terms to the cost function or additional input/state constraints, and
very easy to calculate the gradients of the objective and constraints.
I personally find it very nice to explicitly account for the parametric
encoding of the trajectory in the solution technique.</p>
<example><h1>Direct Collocation for the Pendulum, Acrobot, and
Cart-Pole</h1>
<p>Direct collocation also easily solves the minimum-time problem for
the double integrator. The performance is similar to direct
transcription, but the convergence is visibly different. Try it for
yourself:</p>
<jupyter>examples/trajectory_optimization.ipynb</jupyter>
As always, make sure you take a look at the code!
</example>
</subsection> <!-- end dircol -->
<subsection><h1>Pseudo-spectral Methods</h1>
<p>The direct collocation method of <elib>Hargraves87</elib> was our first
example of explicitly representing the solution of the optimal control
problem as a parameterized trajectory, and adding constraints to the
derivatives at a series of collocation points. In the algorithm above,
the representation of choice was <i>piecewise-polynomials</i>, e.g. cubic
spines, and the spline coefficients were the decision variables. A
closely related approach, often called "pseudo-spectral" optimal control,
uses the same collocation idea, but represents the trajectories instead
using a linear combination of <i>global polynomial</i> basis functions.
These methods use typically much higher degree polynomials, but can
leverage clever parameterizations to write sparse collocation objectives
and to select the collocation points <elib>Garg11+Ross12a</elib>.
Interestingly, The continuously-differentiable nature of the the
representation of these methods has led to comparatively more theorems and
analysis than we have seen for other direct trajectory optimization
methods <elib>Ross12a</elib> -- but despite some of the language used in
these articles please remember they are still local optimization methods
trying to solve a nonconvex optimization problem. While the direct
collocation method above might be expected to converge to the true optimal
solution by adding more segments to the piecewise polynomial (and having
each segement represent a smaller interval of time), here we expect
convergence to happen as we increase the degree of the polynomials.</p>
<p>The pseudo-spectral methods are also sometimes knowns as "orthogonal
collocation" because the $N$ basis polynomials, $\phi_j(t)$, are chosen so
that at the $N$th collocation point $t_j$, we have $$\phi_i(t_j) =
\begin{cases} 1 & i=j, \\ 0 & \text{otherwise.}\end{cases}$$ This can be
accomplished by choosing $$\phi_j(t) = \prod_{i=0, i\ne j}^{N}
\frac{t-t_i}{t_j - t_i}.$$ Note that for both numerical reasons and for
analysis, time is traditionally rescaled from the interval $[t_0, t_f]$ to
$[-1, 1]$. Collocation points are chosen based on small variations of <a href="https://en.wikipedia.org/wiki/Gaussian_quadrature"></a>Gaussian quadrature</a>, known as the "Gauss-Lobatto" which includes collocation points at $t=-1$ and $t=1$.</p>
<p>Interestingly, a number of papers have also infinite-horizon
psuedo-spectral optimization by the nonlinear rescaling of the time
interval $t\in[0, \infty)$ to the half-open interval $\tau\in[-1, 1)$ via
$\tau = \frac{t-1}{t+1}$ <elib>Garg11+Ross12a</elib>. In this case, one
chooses the collocation times so that they include $\tau = -1$ but do not
include $\tau=1$, using the so-called "Gauss-Radau" points
<elib>Garg11</elib>.</p>
<todo>Add an example. Acrobot swingup?</todo>
<subsubsection><h1>Dynamic constraints in implicit form</h1>
<p>There is another, seemingly subtle but potentially important,
opportunity that can be exploited in a few of these transcriptions, if
our main interest is in optimizing systems with significant multibody
dynamics. In some cases, we can actually write the dynamics constraints
directly in their implicit form. We've <a
href="lyapunov.html#ex:implicit">introduced this idea already</a> in the
context of Lyapunov analysis. In many cases, it is nicer or more
efficient to obtain the equations of motion in an implicit form, ${\bf
g}(\bx,\bu,\dot\bx) = 0$, and to avoid ever having to solve for the
explicit form $\dot\bx = {\bf f}(\bx,\bu).$ This can become even more
important when we consider systems for which the explicit form doesn't
have a unique solution -- we will see examples of this when we study
trajectory optimization through contact because the Coulomb model for
friction actually results in a differential inclusion instead of a
differential equation.</p>
<p>The collocation methods, which operate on the dynamic constraints at
collocation points directly in their continuous form, can use the
implicit form directly. It is possible to write a time-stepping
(discrete-time approximation) for direct transcription using implicit
integrators -- again providing constraints in implicit form. The
implicit form is harder to exploit in the shooting methods.</p>
<todo>There should really be better versions of direct transcription and
the collocation methods that are specialized for second-order systems.
We should only have $\bq$ as a decision variable, and be able to impose
the dynamics only on the second derivatives (the first derivatives are
consistent by construction). This is one of the main talking points in
DMOC, even though the emphasis is on the discrete mechanics.</todo>
</subsubsection>
</subsection>
<subsection><h1>Solution techniques</h1>
<p>The different transcriptions presented above represent different ways
to map the (potentially continuous-time) optimal control problem into a
finite set of decision variables, objectives, and constraints. But even
once that choice is made, there are numerous approaches to solving this
optimization problem. Any general approach to <a
href="optimization.html#nonlinear">nonlinear programming</a>
can be applied here; in the python examples we've included so far, the
problems are handed directly to the sequential-quadratic programming (SQP)
solver SNOPT, or to the interior-point solver IPOPT.</p>
<p>There is also quite a bit of exploitable problem-specific stucture in
these trajectory optimization problems due to the sequential nature of the
problem. As a result, there are some ideas that are fairly specific to the
trajectory optimization formulation of optimal control, and customized
solvers can often (and sometimes dramatically) outperform general purpose
solvers.</p>
<p>This trajectory-optimization structure is easiest to discuss, and
implement, in unconstrained formulations, so we will start there. In fact,
in recent years we have seen a surge in popularity in robotics for doing
trajectory optimization using (often special-purpose) solvers for
unconstrained trajectory optimization, where the constrained problems are
transformed into unconstrained problem via <a
href="optimization.html#penalty">penalty methods</a>. I would say penalty
methods based on the augmented Lagrangian are particularly popular for
trajectory optimization these days <elib>Lin91+Toussaint14</elib>.</p>
<subsubsection><h1>Efficiently computing gradients</h1>
<p>Providing gradients of the objectives and constraints to the solver
is not strictly required -- most solvers will obtain them from finite
differences if they are not provided -- but I feel strongly that the
solvers are faster and more robust when exact gradients are provided.
Providing the gradients for the direct transcription methods is very
straight-forward -- we simply provide the gradients for each constraint
individually. But in the direct shooting approach, where we have
removed the $\bx$ decision variables from the program but still write
objectives and constraints in terms of $\bx$, it would become very
inefficient to compute the gradients of each objective/constraint
independently. We need to leverage the chain rule.</p>
<p>To be concise (and slightly more general), let us define
$\bx[n+1]=f_d(\bx[n],\bu[n])$ as the discrete-time approximation of the
continuous dynamics; for example, the forward Euler integration scheme
used above would give $f_d(\bx[n],\bu[n]) = \bx[n]+f(\bx[n],\bu[n])dt.$
Then we have \[\pd{J}{\bu_k} = \pd{\ell_f(\bx[N])}{\bu_k} +
\sum_{n=0}^{N-1} \left(\pd{\ell(\bx[n],\bu[n])}{\bx[n]}
\pd{\bx[n]}{\bu_k} + \pd{\ell(\bx[n],\bu[n])}{\bu_k} \right), \] where
the gradient of the state with respect to the inputs can be computed
during the "forward simulation", \[ \pd{\bx[n+1]}{\bu_k} =
\pd{f_d(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} +
\pd{f_d(\bx[n],\bu[n])}{\bu_k}. \] These simulation gradients can
also be used in the chain rule to provide the gradients of any
constraints. Note that there are a lot of terms to keep around here, on
the order of (state dim) $\times$ (control dim) $\times$ (number of
timesteps). Ouch. Note also that many of these terms are zero; for
instance with the Euler integration scheme above $\pd{\bu[n]}{\bu_k} =
0$ if $k\ne n$. (If this looks like I'm mixing two notations here,
recall that I'm using $\bu_k$ to represent the decision variable and
$\bu[n]$ to represent the input used in the $n$th step of the
simulation.)</p>
</subsubsection> <!-- gradients -->
<subsubsection><h1>The special case of direct shooting without state
constraints</h1>
<p>By solving for $\bx(\cdot)$ ourselves, we've removed a large number
of constraints from the optimization. If no additional state
constraints are present, and the only gradients we need to compute are
the gradients of the objective, then a surprisingly efficient algorithm
emerges. I'll give the steps here without derivation, but will derive
it in the Pontryagin section below: <ol> <li>Simulate forward:
$$\bx[n+1] = f_d(\bx[n],\bu_n),$$ from $\bx[0] = \bx_0$.</li>
<li>Calculate backwards: $$\lambda[n-1] =
\pd{\ell(\bx[n],\bu[n])}{\bx[n]}^T + \pd{f(\bx[n],\bu[n])}{\bx[n]}^T
\lambda[n],$$ from $\lambda[N-1]=\pd{\ell_f(\bx[N])}{\bx[N]}$.</li>
<li>Extract the gradients: $$\pd{J}{\bu[n]} =
\pd{\ell(\bx[n],\bu[n])}{\bu[n]} + \lambda[n]^T
\pd{f(\bx[n],\bu[n])}{\bu[n]},$$ with $\pd{J}{\bu_k} = \sum_n
\pd{J}{\bu[n]}\pd{\bu[n]}{\bu_k}$.</li> </ol> <p>Here $\lambda[n]$ is a
vector the same size as $\bx[n]$ which has an interpretation as
$\lambda[n]=\pd{J}{\bx[n+1]}^T$. The equation governing $\lambda$ is
known as the <em>adjoint equation</em>, and it represents a dramatic
efficiency improvement over calculating the huge number of simulation
gradients described above. In case you are interested, yes the adjoint
equation is exactly the <em>backpropagation algorithm</em> that is
famous in the neural networks literature, or more generally a bespoke
version of reverse-mode <a
href="https://en.wikipedia.org/wiki/Automatic_differentiation">automatic
differentiation</a>. </p>
</subsubsection>
<subsubsection><h1>Getting good solutions... in practice.</h1>
<p>As you begin to play with these algorithms on your own problems, you might feel like you're on an emotional roller-coaster. You will have moments of incredible happiness -- the solver may find very impressive solutions to highly nontrivial problems. But you will also have moments of frustration, where the solver returns an awful solution, or simply refuses to return a solution (saying "infeasible"). The frustrating thing is, you cannot distinguish between a problem that is actually infeasible, vs. the case where the solver was simply stuck in a local minima.</p>
<p>So the next phase of your journey is to start trying to "help" the solver along. There are two common approaches.</p>
<p>The first is tuning your cost function -- some people spend a lot of time adding new elements to the objective or adjusting the relative weight of the different components of the objective. This is a slippery slope, and I tend to try to avoid it (possibly to a fault; other groups tend to put out more compelling videos!).</p>
<p>The second approach is to give a better initial guess to your solver
to put it in the vicinity of the "right" local minimal. I find this
approach more satisfying, because for most problems I think there really
is a "correct" formulation for the objective and constraints, and we
should just aim to find the optimal solution. Once again, we do see a
difference here between the direct shooting algorithms and the direct
transcription / collocation algorithms. For shooting, we can only
provide the solver with an initial guess for $\bu(\cdot)$, whereas the
other methods allow us to also specify an initial guess for $\bx(\cdot)$
directly. I find that this can help substantially, even with very
simple initializations. In the direct collocation examples for the
swing-up problem of the acrobot and cart-pole, simply providing the
initial guess for $\bx(\cdot)$ as a straight line trajectory between the
start and the goal was enough to help the solver find a good solution;
in fact it was necessary.</p>
</subsubsection>
<!-- maybe: relaxing dynamic constraints (as in the formulation n Ross and Karpenko, 2012 between eq 16 and 17) -->
</subsection> <!-- end pros/cons -->
</section> <!-- end three algorithms -->
<section><h1>Local Trajectory Feedback Design</h1>
<p>Once we have obtained a locally optimal trajectory from trajectory
optimization, there is still work to do...</p>
<subsection><h1>Time-varying LQR</h1>
<p>Take $\bar\bx(t) = \bx(t) - \bx_0(t)$, and $\bar\bu(t) =
\bu(t)-\bu_0(t)$, then apply finite-horizon LQR (see the LQR chapter).</p>
</subsection>
<subsection><h1>Model-Predictive Control</h1>
<p>The maturity, robustness, and speed of solving trajectory optimization
using convex optimization leads to a beautiful idea: if we can optimize
trajectories quickly enough, then we can use our trajectory optimization
as a feedback policy. The recipe is simple: (1) measure the current
state, (2) optimize a trajectory from the current state, (3) execute the
first action from the optimized trajectory, (4) let the dynamics evolve
for one step and repeat. This recipe is known as <i>model-predictive
control</i> (MPC). </p>
<p>Despite the very computational nature of this controller (there is no
closed-form representation of this policy; it is represented only
implicitly as the solution of the optimization), there is a bounty of
theoretical and algorithmic results on MPC
<elib>Garcia89+Camacho13</elib>. And there are a few core ideas that
practitioners should really understand.</p>
<p>One core idea is the concept of <i>receding-horizon</i> MPC. Since our
trajectory optimization problems are formulated over a finite-horizon, we
can think each optimization as reasoning about the next $N$ timesteps. If
our true objective is to optimize the performance over a horizon longer
than $N$ (e.g. over the infinite horizon), then it is standard to continue
solving for an $N$ step horizon on each evaluation of the controller. In
this sense, the total horizon under consideration continues to move
forward in time (e.g. to recede).</p>
<p>Some care must be taken in receding-horizon formulations because on
each new step we are introducing new costs and constraints into the
problem (the ones that would have been associated with time $N+1$ on the
previous solve) -- it would be very bad to march forward in time solving
convex optimization problems only to suddenly encounter a situation where
the solver returns "infeasible!". One can design MPC formulations that
guarantee <i>recursive feasibility</i> -- e.g. guarantee that if a
feasible solution is found at time $n$, then the solver will also find a
feasible solution at time $n+1$.</p>
<p>Perhaps the simplest mechanism for guaranteeing recursive feasibility
in an optimization for stabilizing a fixed point, $(\bx^*, \bu^*)$, is to
add a final-value constraint to the receding horizon, $\bx[N] = \bx^*$.
This idea is simple but important. Considering the
trajectories/constraints in absolute time, then on step $k$ of the
algorithm, we are optimizing for $\bx[k], ... , \bx[k+N],$ and $\bu[k],
..., \bu[k+N-1]$; let us say that we have found a feasible solution for
this problem. The danger in receding-horizon control is that when we shift
to the next step ($k+1$) we introduce constraints on the system at
$\bx[k+N+1]$ for the first time. But if our feasible solution in step $k$
had $\bx[k+N] = \bx^*$, then we know that setting $\bx[k+N+1] = \bx^*,
\bu[k+N] = \bu^*$ is guaranteed to provide a feasible solution to the new
optimization problem in step $k+1$. With feasibility guaranteed, the
solver is free to search for a lower-cost solution (which may be available
now because we've shifted the final-value constraint further into the
future). It is also possible to formulate MPC problems that guarantee
recursive feasibility even in the presence of modeling errors and
disturbances (c.f. <elib>Bemporad99</elib>).</p>
<p>The theoretical and practical aspects of Linear MPC are so well
understood today that it is considered the de-facto generalization of LQR
for controlling a linear system subject to (linear) constraints.</p>
</subsection>
</section> <!-- end feedback design -->
<section id="perching"><h1>Case Study: A glider that can land on a perch
like a bird</h1>
<p>From 2008 til 2014, my group conducted a series of increasingly
sophisticated investigations
<elib>Cory08+Roberts09+Cory10a+Moore11a+Moore12+Moore14b</elib> which asked
the question: can a fixed-wing UAV land on a perch like a bird?</p>
<figure>
<img width="80%" src="figures/perch-sequence.jpg">
<figcaption>Basic setup for the glider landing on a perch.</figcaption>
</figure>
<p>At the outset, this was a daunting task. When birds land on a perch,
they pitch up and expose their wings to an "angle-of-attack" that far
exceeds the typical flight envelope. Airplanes traditionally work hard to
avoid this regime because it leads to aerodynamic "stall" -- a sudden loss
of lift after the airflow separates from the wing. But this loss of lift is
also accompanied by a signficant increase in drag, and birds exploit this
when they rapidly decelerate to land on a perch. Post-stall aerodynamics
are a challenge for control because (1) the aerodynamics are time-varying
(characterized by periodic vortex shedding) and nonlinear, (2) it is much
harder to build accurate models of this flight regime, at least in a wind
tunnel, and (3) stall implies a loss of attached flow on the wing and
therefore on the control surfaces, so a potentially large reduction in
control authority.</p>
<p>We picked the project initially thinking that it would be a nice example
for model-free control (like reinforcement learning -- since the models were
unknown). In the end, however, it turned out to be the project that really
taught me about the power of model-based trajectory optimization and linear
optimal control. By conducting dynamic system identification experimetns in
motion captured, we were able to fit both surprisingly simple models (based
on flat-plate theory) to the dynamics<elib>Cory08</elib>, and also more
accurate models using "neural-network-like" terms to capture the residuals
between the model and the data <elib>Moore14b</elib>. This made model-based
control viable, but the dynamics were still complex -- while trajectory
optimization should work, I was quite dubious about the potential for
regulating to those trajectories with only linear feedback.</p>
<p>I was wrong. Over and over again, I watched time-varying linear quadratic
regulators take highly nontrivial corrective actions -- for instance,
dipping down early in the trajectory to gain kinetic energy or tipping up to
dump energy out of the system -- in order to land on the perch at the final
time. Although the quality of the linear approximation of the dynamics did
degrade the farther that we got from the nominal trajectory, the validity of
the controller dropped off much less quickly (even as the vector field
changed, the direction that the controller needed to push did not). This was
also the thinking that got me initially so interested in understanding the
regions of attraction of linear control on nonlinear systems.</p>
<p>In the end, the experiments were very successful. We started searching
for the "simplest" aircraft that we could build that would capture the
essential control dynamics, reduce complexity, and still accomplish the
task. We ended up building a series of flat-plate foam gliders (no
propellor) with only a single actuator to control the elevator. We added
dihedral to the wings to help the aircraft stay in the longitudinal plane.
The simplicity of these aircraft, plus the fact that they could be
understood through the lens of quite simple models makes them one of my
favorite canonical underactuated systems.</p>
<figure>
<iframe width="560" height="315"
src="https://www.youtube.com/embed/syJF8js9aEU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<figcaption>The original perching experiments from <elib>Cory08</elib> in
a motion capture arena with a simple rope acting as the perch. The main
videos were shot with high-speed cameras; an entire perching trajectory
takes about .8 seconds.</figcaption>
</figure>
<subsection><h1>The Flat-Plate Glider Model</h1>
<figure>
<img width="50%" src="figures/glider.svg">
<figcaption>The flat-plate glider model. Note that traditionally
aircraft coordinates are choosen so that positive pitch brings the nose
up; but this means that positive z is down (to have a right-handed
system). For consistency, I've chosen to stick with the vehicle
coordinate system that we use throughout the text -- positive z is up,
but positive pitch is down. </figcaption>
<todo>better aircraft outline.</todo>
</figure>
<p>In our experiments, we found the the dynamics of our aircraft were
capture very well by the so-called "flat plate model" <elib>Cory08</elib>.
In flat plate theory lift and drag forces of a wing are summarized by a
single force at the center-of-pressure of the wing acting normal to the
wing, with magnitude: $$f_n(s, \alpha, {\bv}) = s \sin(\alpha) \rho
|\bv|^2,$$ where $s$ is the area of the wing, $\alpha$ the angle of attack
of the surface, $\rho$ is the density of the air, and $v$ is the velocity
of the center of pressure relative to the air. This corresponds to having
lift and drag coefficients $$c_{\text{lift}} = 2\sin\alpha\cos\alpha,
\quad c_{\text{drag}} = 2\sin^2\alpha.$$ In our glider model, we
summarize all of the aerodynamic forces by contributions of two lifting
surfaces, the wing (including some contribution from the horizontal
fuselage) denoted by subscript $w$ and the elevator denoted by subscript
$e$, with centers at $\bp_w = [x_w, z_w]^T$ and $\bp_e = [x_e, z_e]^T$
given by the kinematics: $$\bp_w = \bp - l_w\begin{bmatrix} c_\theta \\
-s_\theta \end{bmatrix},\quad \bp_e = \bp - l_h \begin{bmatrix} c_\theta
\\ -s_\theta \end{bmatrix} - l_e \begin{bmatrix} c_{\theta+\phi} \\
-s_{\theta+\phi} \end{bmatrix},$$ where the origin of our vehicle
coordinate system, $\bp = [x,z]^T$, is chosen to be the center of mass. We
assume still air, so the angle of attack is given by $$\alpha_w =
-\atantwo(\dot{z}_w, \dot{x}_w) - \theta, \quad \alpha_e =
-\atantwo(\dot{z}_e, \dot{x}_e) - \theta - \phi.$$ We assume that the
elevator is massless, and the actuator controls velocity directly (note
that this breaks our "control affine" structure, but is more realistic for
the tiny hobby servos we were dealing with). This gives us a total of 7
state variables $\bx = [x, z, \theta, \phi, \dot{x}, \dot{z},
\dot\theta]^T$ and one control input $\bu = \dot\phi.$ The resulting
equations of motion are: \begin{gather*} f_w = f_n(s_w, \alpha_w,
\dot\bp_w), \quad f_e = f_n(s_e, \alpha_e, \dot\bp_e), \\ \ddot{x} =
\frac{1}{m} \left(f_w s_\theta + f_e s_{\theta+\phi} \right), \\ \ddot{z}
= \frac{1}{m} \left(f_w c_\theta + f_e c_{\theta+\phi} \right) - g, \\
\ddot\theta = \frac{1}{I} \left( l_w f_w + (l_h c_\phi + l_e) f_e \right).
\end{gather*}
</p>
</subsection>
<subsection><h1>Trajectory optimization</h1></subsection>
<subsection><h1>Trajectory stabilization</h1></subsection>
<subsection><h1>Beyond a single trajectory</h1>
<p>The linear controller around a nominal trajectory was surprisingly
effective, but it's not enough. We'll return to this example again when we
talk about "feedback motion planning", in order to discuss how to find a
controller that can work for many more initial conditions -- ideally all of
the initial conditions of interest for which the aircraft is capable of
getting to the goal.</p>
</subsection>
</section>
<section><h1>Pontryagin's Minimum Principle</h1>
<p>The tools that we've been developing for numerical trajectory
optimization are closely tied to theorems from (analytical) optimal control.
Let us take one section to appreciate those connections.</p>
<p>What precisely does it mean for a trajectory, $\bx(\cdot),\bu(\cdot)$, to
be locally optimal? It means that if I were to perturb that trajectory in
any way (e.g. change $\bu_3$ by $\epsilon$), then I would either incur
higher cost in my objective function or violate a constraint. For an
unconstrained optimization, a <em>necessary condition</em> for local
optimality is that the gradient of the objective at the solution be exactly
zero. Of course the gradient can also vanish at local maxima or saddle
points, but it certainly must vanish at local minima. We can generalize this
argument to constrained optimization using <em>Lagrange multipliers</em>.
</p>
<todo>explain "direct" vs "indirect" methods somewhere in here.</todo>
<subsection><h1>Lagrange multiplier derivation of the adjoint equations</h1>
<p>Let us use Lagrange multipliers to derive the necessary conditions for
our constrained trajectory optimization problem in discrete time
\begin{align*} \min_{\bx[\cdot],\bu[\cdot]} & \ell_f(\bx[N]) +
\sum_{n=0}^{N-1} \ell(\bx[n],\bu[n]),\\ \subjto \quad & \bx[n+1] =
f_d(\bx[n],\bu[n]). \end{align*} Formulate the Lagrangian,
\[L(\bx[\cdot],\bu[\cdot],\lambda[\cdot]) = \ell_f(\bx[N]) +
\sum_{n=0}^{N-1} \ell(\bx[n],\bu[n]) + \sum_{n=0}^{N-1} \lambda^T[n]
\left(f_d(\bx[n],\bu[n]) - \bx[n+1]\right), \] and set the derivatives to
zero to obtain the adjoint equation method described for the shooting
algorithm above: \begin{gather*} \forall n\in[0,N-1], \pd{L}{\lambda[n]} =
f_d(\bx[n],\bu[n]) - \bx[n+1] = 0 \Rightarrow \bx[n+1] = f(\bx[n],\bu[n])
\\ \forall n\in[0,N-1], \pd{L}{\bx[n]} = \pd{\ell(\bx[n],\bu[n])}{\bx} +
\lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bx} - \lambda^T[n-1] = 0 \\ \quad
\Rightarrow \lambda[n-1] = \pd{\ell(\bx[n],\bu[n])}{\bx}^T +
\pd{f_d(\bx[n],\bu[n])}{\bx}^T \lambda[n]. \\ \pd{L}{\bx[N]} =
\pd{\ell_f}{\bx}^T - \lambda^T[N-1] = 0 \Rightarrow \lambda[N-1] =
\pd{\ell_f}{\bx} \\ \forall n\in[0,N-1], \pd{L}{\bu[n]} =
\pd{\ell(\bx[n],\bu[n])}{\bu} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bu}
= 0. \end{gather*} Therefore, if we are given an initial condition $\bx_0$
and an input trajectory $\bu[\cdot]$, we can verify that it satisfies the
necessary conditions for optimality by simulating the system forward in
time to solve for $\bx[\cdot]$, solving the adjoint equation backwards in
time to solve for $\lambda[\cdot]$, and verifying that $\pd{L}{\bu[n]} =
0$ for all $n$. The fact that $\pd{J}{\bu} = \pd{L}{\bu}$ when
$\pd{L}{\bx} = 0$ and $\pd{L}{\lambda} = 0$ follows from some basic
results in the calculus of variations.</p>
</subsection>
<subsection><h1>Necessary conditions for optimality in continuous time</h1>
<p>You won't be surprised to hear that these necessary conditions have an
analogue in continuous time. I'll state it here without derivation. Given
the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx =
f(\bx,\bu)$, and the instantaneous cost $\ell(\bx,\bu)$, for a trajectory
$\bx(\cdot),\bu(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal it
must satisfy the conditions that \begin{align*} \forall
t\in[t_0,t_f],\quad & \dot{\bx} = f(\bx,\bu), \quad &\bx(0)=\bx_0\\
\forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{\ell}{\bx}^T +
\pd{f}{\bx}^T \lambda, \quad &\lambda(T) = \pd{\ell_f}{\bx}^T \\ \forall
t\in[t_0,t_f],\quad & \pd{\ell}{\bu} + \lambda^T\pd{f}{\bu} = 0.&
\end{align*}</p>
<p>In fact the statement can be generalized even beyond this to the case
where $\bu$ has constraints. The result is known as Pontryagin's minimum
principle -- giving <em>necessary conditions</em> for a trajectory to be
optimal.</p>
<theorem><h1>Pontryagin's Minimum Principle</h1>
<p>Adapted from <elib>Bertsekas00a</elib>. Given the initial conditions,
$\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the
instantaneous cost $\ell(\bx,\bu)$, for a trajectory
$\bx^*(\cdot),\bu^*(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal
it must satisfy the conditions that \begin{align*} \forall
t\in[t_0,t_f],\quad & \dot{\bx}^* = f(\bx^*,\bu^*), \quad
&\bx^*(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot \lambda^* =
\pd{\ell}{\bx}^T + \pd{f}{\bx}^T \lambda^*, \quad &\lambda^*(T) =
\pd{\ell_f}{\bx}^T \\ \forall t\in[t_0,t_f],\quad & u^* =
\argmin_{\bu\in{\cal U}} \left[\ell(\bx^*,\bu) + (\lambda^*)^T
f(\bx^*,\bu) \right].& \end{align*}</p>
</theorem>
<p>Note that the terms which are minimized in the final line of the
theorem are commonly referred to as the Hamiltonian of the optimal control
problem, $$H(\bx,\bu,\lambda,t) = \ell(\bx,\bu) + \lambda^T f(\bx,\bu).$$
It is distinct from, but inspired by, the Hamiltonian of classical
mechanics. Remembering that $\lambda$ has an interpretation as
$\pd{J}{\bx}^T$, you should also recognize it from the HJB. </p>
</subsection>
</section> <!-- end pontryagin -->
<section><h1>Variations and Extensions</h1>
<todo>Multiple Shooting: Direct transcription is the limit of multiple shooting where every timestep is a different "shot".</todo>
<a href="acrobot.html#differential_flatness">Differential flatness</a>.
<subsection><h1>Iterative LQR and Differential Dynamic
Programming</h1>
<p>There is another approach to trajectory optimization (at least for
initial-value problems) that has gotten quite popular lately. Iterative
LQR (iLQR)<elib>Li04</elib> also known as Sequential Linear Quadratic
optimal control) <elib>Sideris05</elib>. The idea is simple enough:
given an initial guess at the input and state trajectory, make a linear
approximation of the dynamics and a quadratic approximation of the cost
function. Then compute and simulate the time-varying LQR controller to
find a new input and state trajectory. Repeat until convergence.</p>
<p>The motivation behind iterative LQR is quite appealing -- it
leverages the surprising structure of the Riccati equations to come up
with a second-order update to the trajectory after a single backward
pass of the Riccati equation followed by a forward simulation.
Anecdotally, the convergence is fast and robust. Numerous
applications...<elib>Farshidian17+Tassa12</elib>.</p>
<p>One key limitation of iLQR is that it only supports unconstrained
trajectory optimization -- at least via the Riccati solution. The
natural extension for constrained optimization would be to replace the
Riccati solution with an iterative linear model-predictive control (MPC)
optimization; this would result in a quadratic program and is very close
to what is happening in SQP. But a more common approach in the current
robotics literature is to add any constraints into the objective
function using <a
href="https://en.wikipedia.org/wiki/Penalty_method">penalty methods</a>,
especially using <a
href="https://en.wikipedia.org/wiki/Augmented_Lagrangian_method">Augmented
Lagrangian</a>.</p>
<!--
<p>Thinking about the MPC version of iLQR leads to a curious observation. I
think everyone using iLQR by solving Riccati equations is doing more work
than they need to. Clearly, in the discrete-time unconstrained form, the
time-varying LQR + forward simulation is solving the following optimization
problem: \begin{gather*} \min \sum_{n=0}^N \bx[n]^T \bQ \bx[n] + ...
\end{gather*}</p>
-->
</subsection>
<todo>DMOC</todo>
<subsection><h1>Mixed-integer convex optimization for non-convex
constraints</h1>
<example><h1>Mixed-integer planning around obstacles</h1>
<jupyter no_colab>examples/miqp_planning.ipynb</jupyter>
</example>
</subsection>