-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathwhitepaper.tex
1679 lines (1432 loc) · 79 KB
/
whitepaper.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Copyright 2020, Gernot Heiser
% SPDX-License-Identifier: CC-BY-SA-4.0
% Setting this to true turns on the ``draft'' watermark
\newif \ifDraft \Draftfalse
%\Drafttrue
\documentclass[english,a4paper,12pt]{report}
\usepackage[]{sel4}
%%% Hyperlinks embed links for references in the PDF output (both for
%%% internal as well as external links).
\newif \ifhyperlinks \hyperlinkstrue
%%% When using this you should use \autoref{label} instead of
%%% Section~\ref{label}. This will make ``Section'' part of the link, not
%%% just the number.
% Get bibliography into the ToC
\usepackage[nottoc]{tocbibind}
\usepackage{xspace} % to fix spacing in macros producing words. Eg:
\newcommand{\seLfour}{seL4\xspace} % use as normal text, examples below.
% Example of multi-letter symbols for math mode:
\newcommand{\myFunc}{\mathrm{myFunc}}
\newcommand{\myVar}{\mathit{diff}}
% Leave this in here, it is often useful to deal with latexdiff breakage.
% Put it around breakage-causing segments (in both versions to be diffed!)
\newenvironment{DIFnomarkup}{}{}
\usepackage{graphicx}
\setkeys{Gin}{keepaspectratio=true,clip=true,draft=false,width=\linewidth}
\graphicspath{{./imgs/}}
\usepackage{url,verbatim,datetime,tabularx}
\usepackage[olditem]{paralist}
\usepackage[authoryear]{natbib}
\ifx\undefined\chapter
%\typeout{no chapters}
\newcommand{\Sect}[1]{\section{#1}}
\newcommand{\SSect}[1]{\subsection{#1}}
\newcommand{\SSSect}[1]{\subsubsection*{#1}}
\newcommand{\Acks}{\section*{Acknowledgments}}
\else
%\typeout{have chapters}
\newcommand{\Sect}[1]{\chapter{#1}}
\newcommand{\SSect}[1]{\section{#1}}
\newcommand{\SSSect}[1]{\subsection*{#1}}
\newcommand{\Acks}{\chapter*{Acknowledgments}}
\fi
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\Break}{\ \nopagebreak\\}
\newlength{\chillilng}\setlength{\chillilng}{8mm}
\newlength{\chillimarg}\setlength{\chillimarg}{10mm}
\newcommand{\chilli}{\includegraphics[width=\chillilng]{chilli}}
\newcommand{\chilliItem}{\raisebox{-5mm}[1ex][0pt]{%
\makebox[\chillilng][r]{\chilli}}}
\newcommand{\chilliSect}{\raisebox{-2mm}[1ex][0pt]{\chilli\hspace{0.8em}}}
\newenvironment{Chilli}{
\begin{list}{}{
\setlength{\labelwidth}{\chillilng}
\setlength{\leftmargin}{\chillimarg}}
\item[\chilliItem]
}
{\end{list}}
\ifDraft
\usepackage{draftwatermark}
\SetWatermarkLightness{0.85}
\newcommand{\Comment}[1]{\textbf{\textsl{#1}}}
\newenvironment{LongComment}[1] % multi-paragraph comment, argument is owner
{\begingroup\par\noindent\slshape \textbf{Begin Comment[#1]}\par}
{\par\noindent\textbf{End Comment}\endgroup\par}
\newcommand{\FIXME}[1]{\textbf{\textsl{FIXME: #1}}}
\newcommand{\TODO}[1]{\textbf{\textsl{TODO: #1}}}
\else
\newcommand{\Comment}[1]{\relax}
\newenvironment{LongComment}[1]{\expandafter\comment}{\expandafter\endcomment}
\newcommand{\FIXME}[1]{\relax}
\newcommand{\TODO}[1]{\relax}
\fi
\newcommand{\gernot}[1]{\Comment{#1 \colorbox{yellow}{[gernot]}}}
\hypersetup{pdftitle={The seL4 Microkernel -- An Introduction},
pdfauthor={Gernot Heiser},
pdfsubject={seL4 Foundation Whitepaper Revision 1.2},
pdfkeywords={seL4, microkernel, performance},
pdfdisplaydoctitle=true}
\begin{document}\raggedright
%% Capitalisation of cross references
\renewcommand{\chapterautorefname}{Chapter}
\renewcommand{\sectionautorefname}{Section}
\renewcommand{\subsectionautorefname}{Section}
\renewcommand{\subsubsectionautorefname}{Section}
\renewcommand{\appendixautorefname}{Appendix}
\renewcommand{\Hfootnoteautorefname}{Footnote}
%% Commands for index
\newcommand{\Htextbf}[1]{\textbf{\hyperpage{#1}}}
\renewcommand{\topfraction}{0.9}
\renewcommand{\bottomfraction}{0.9}
\title{The seL4 Microkernel}
\subtitle{An Introduction}
\author{Gernot Heiser}
% \authortitle{Author's title}
\email{[email protected]}
% \docversion{Revision 1.2 of 2020-06-10} % fa89642b
% \docversion{Revision 1.3 of 2024-05-07} % 9fefcc9e
\docversion{Revision 1.4 of 2025-01-08}
\date{}
\thispagestyle{plain}
\maketitle
\doCopyright[2020]
\begin{abstract}
This whitepaper provides an introduction to and overview of
seL4. We explain what seL4
is (and is not) and explore its defining features. We explain what
makes seL4
uniquely qualified as the operating-system kernel of choice for
security- and safety-critical systems, and generally embedded and
cyber-physical systems. In particular, we explain
seL4's assurance story, its security- and safety-relevant
features, and its benchmark-setting performance. We also discuss
typical usage scenarios, including incremental cyber retrofit of
legacy systems.
\vspace{6ex}
\setcounter{page}{0}
\newcommand{\Sub}{{\boldmath\(\rightarrow\)}\xspace}
\textbf{CCS Concepts}
\begin{compactitem}
\item Software and its engineering \Sub Operating Systems
\item Security and privacy \Sub Systems security
\item Security and privacy \Sub Formal methods and theory of security
\item Computer systems organization \Sub Real-time systems \Sub
Real-time operating systems
\item Computer systems organization \Sub Real-time systems \Sub
Dependable and fault-tolerant systems and networks
\end{compactitem}
\vspace{2ex}
\textbf{Keywords}\\
seL4, microkernel, performance
\vspace{2ex}
\textbf{Reference Format:}\\
Gernot Heiser. The seL4 Microkernel -- An Introduction. White
paper. The seL4 Foundation, \thedocversion.
\end{abstract}
\tableofcontents
\listoffigures
% \cleardoublepage
\Sect{What Is seL4?}\label{s:intro}
\begin{description}
\item[seL4 is an operating system microkernel]\Break
An \href{https://en.wikipedia.org/wiki/Operating_system}{operating
system} (OS) is the low-level system software that controls a
computer system's resources and enforces security. Unlike
application software, the OS has
exclusive access to a more privileged execution mode of the processor (\emph{kernel mode})
that gives it direct access to hardware. Applications only ever
execute in
\emph{user mode} and can only access hardware as permitted by the
OS.
An OS
\href{https://en.wikipedia.org/wiki/Microkernel}{microkernel} is a
minimal core of an OS, reducing the code executing at higher
privilege to a minimum. seL4 is a member of the
\href{https://en.wikipedia.org/wiki/L4_microkernel_family}{L4
family of microkernels} that goes back to the mid-1990s.
(And no, \emph{seL4 has nothing to do with seLinux.})
\item[seL4 is also a hypervisor] \Break
seL4 supports virtual machines that can run a fully fledged guest OS
such as Linux. Subject to seL4's enforcement of communication
channels, guests and their applications can communicate with
each other as well as with native applications.
Learn more about what it means that seL4 is a microkernel and its use as
a hypervisor in \autoref{s:ukernel}. And learn about real-world
deployment scenarios, including approaches for retrofitting security into
legacy systems in \autoref{s:retrofit}.
\item[seL4 is proved correct]\Break
seL4 comes with a formal, mathematical, machine-checked
\emph{proof of implementation correctness}, meaning the kernel is
in a very strong sense ``bug free'' with respect to its specification.
In fact, seL4 is the world's
first OS kernel with such a proof at the code level
\citep{Klein_EHACDEEKNSTW_09}.
\item[seL4 is provably secure]\Break
Besides implementation correctness, seL4 comes with further proofs of
\emph{security enforcement} \citep{Klein_AEMSKH_14}. They say that
in a correctly configured seL4-based system, the kernel
guarantees the classical security properties of
\emph{confidentiality, integrity and availability}. More about
these proofs in \autoref{s:proofs}.
\item[seL4 improves security with fine-grained access control
through capabilities]\Break
\href{https://en.wikipedia.org/wiki/Capability-based_security}{Capabilities}
are access tokens which support very fine-grained control over which
entity can access a particular resource in a system. They support
strong security according to the
\href{https://en.wikipedia.org/wiki/Principle_of_least_privilege}{principle
of least privilege} (also called principle of least authority,
POLA). This is a core design principle of highly secure system,
and is impossible to achieve with the way access control happens
in mainstream systems such as Linux or Windows.
seL4 is still the \emph{world's only OS that is both
capability-based and formally verified}, and as such has a
defensible claim of being the world's most secure OS. More about capabilities
in \autoref{s:caps}.
\item[seL4 ensures safety of time-critical systems] \Break% ~\nopagebreak\\
seL4 is the world's only OS kernel (at least in the open
literature) that has undergone a complete and sound analysis of
its \emph{worst-case execution time} (WCET)
\citep{Blackham_SCRH_11, Sewell_KH_17}. This means,
if the kernel is configured appropriately, all kernel
operations are bounded in time, and the bound is known. This is a
prerequisite for building \emph{hard real-time systems}, where
failure to react to an event within a strictly bounded time period
is catastrophic.
\item[seL4 is the world's most advanced mixed-criticality OS] \Break
seL4 provides strong support for
\href{https://en.wikipedia.org/wiki/Mixed_criticality}{mixed
criticality real-time systems} (MCS), where the timeliness of critical
activities must be ensured even if they co-exist with less trusted
code executing on the same platform. seL4 achieves this with a
flexible model that retains good resource utilisation, unlike the
more established MCS OSes that use strict (and inflexible) time
and space partitioning \citep{Lyons_MAH_18}. More on seL4's
real-time and MCS support in \autoref{s:rtos}.
\item[seL4 is the world's fastest microkernel] \Break
Traditionally, systems are either (sort-of) secure, or they are
fast. seL4 is unique in that it is both. seL4 is designed to
support a wide range of
real-world use cases, whether they are
security- (or safety-)critical or not, and excellent performance
is a requirement. More on seL4's
performance in \autoref{s:perf}.
\item[seL4 is pronounced ``ess-e-ell-four''] \Break
The pronunciation ``sell-four'' is deprecated.
\end{description}
\SSSect{How to read this document}
This document is meant to be approachable by a wide
audience. However, for completeness, we will add some deeper
technical detail in places.
\begin{Chilli}
Such detail will be marked with a chilli, like the one on the left. If you
see this then you know you can safely skip the marked passage if
you think the technical description is too ``spicy'' for your
taste, or if you are simply not interested in this level of
detail. Only other chillied passages will assume you have read it.
\end{Chilli}
\SSSect{\chilliSect Technical section}
Where the chilli appears in a section title, such as here, this
indicates that the whole section is fairly technical and can be skipped.
\Sect{seL4 Is a Microkernel and a Hypervisor, It Is Not an
OS}\label{s:ukernel}
\SSect{Monolithic kernels vs microkernels}
To understand the difference between a mainstream OS, such as Linux,
and a microkernel, such as seL4, let's look at
\autoref{f:osstruct}.
\begin{figure}[hb]
\centering
\includegraphics[width=1.0\textwidth]{kernel}
\caption[Operating-system structure.]{Operating-system structure:
Monolithic kernel (left) vs microkernel (right).}
\label{f:osstruct}
\end{figure}
The left side presents a (fairly abstracted) view of the
architecture of a system such as Linux. The yellow part is the OS
\emph{kernel}, it offers services such as file storage and
networking to applications. All the code that implements those
services executes in the \emph{privileged mode} of the hardware,
also called \emph{kernel mode} or \emph{supervisor mode} --
the execution mode that has unfettered access and control of all
resources in the system. In contrast, applications run in
unprivileged, or \emph{user mode}, and do not have direct access to
many hardware resources, which must be accessed through the OS. The
OS is internally structured in a number of layers, where each layer
provides abstractions implemented by layers below.
The problem with privileged-mode code is that it is dangerous: If
anything goes wrong here, there's nothing to stop the damage. In
particular, if this code has a bug that can be exploited by an
attacker to run the attacker's code in privileged mode (called a
privilege-escalation or arbitrary code-execution attack) then the
attacker can do what they want with the system. Such flaws are the
root problem of the many system compromises we experience in
mainstream systems.
Of course, software bugs are mostly a fact of life, and
OSes are not different. For example, the Linux kernel comprises of
the order of 20 million lines of source code (20\,MSLOC); we can
estimate that it contains literally tens of thousands of bugs
\citep{Biggs_LH_18}. This is obviously a huge attack surface! This
idea is captured by saying that Linux has a large \emph{trusted computing
base} (TCB), which is defined as the subset of the overall system
that must be trusted to operate correctly for the system to be
secure.
The idea behind a microkernel design is to drastically reduce the
TCB and thus the
attack surface. As schematically shown at the right of
\autoref{f:osstruct}, the kernel, i.e.\ the part of the system
executing in privileged mode, is much smaller. In a well-designed
microkernel, such as seL4, it is of the order of ten thousand lines
of source code (10\,kSLOC). This is literally three orders of
magnitude smaller than the Linux kernel, and the attack surface shrinks
accordingly (maybe more, as the density of bugs probably
grows more than linearly with code size).
Obviously, it is not possible to provide the same functionality, in
terms of OS services, in such a small code base. In fact, the
microkernel provides almost no services: it is just a thin wrapper
around hardware, just enough to securely multiplex hardware
resources. What the microkernel mostly provides is isolation,
sandboxes in which programs can execute without interference from
other programs. And, critically, it provides a \emph{protected
procedure call} (PPC) mechanism, which is a form of inter-process
communication (IPC). For historical reasons the term IPC lives on,
but I recommend avoiding it as it leads to misconceptions that
result in poor designs.
\begin{Chilli}
For a deeper explanation of what seL4 IPC is and is not, I
recommend reading my blog \href{https://microkerneldude.wordpress.com/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/}{How to (and how not to) use seL4 IPC}.
\end{Chilli}
The PPC mechanism allows one program to securely call a function in a
different program, where the microkernel transports function inputs
and outputs between the programs and, importantly, enforces
interfaces: the ``remote'' (contained in a different sandbox)
function can only be called at an exported entrypoint, and only by
explicitly authorised clients (who have been given the appropriate
capability, see \autoref{s:caps}).
The microkernel system uses this approach to provide the services
the monolithic OS implements in the kernel. In the microkernel
world, these services are just programs, no different from applications,
that run in their own sandboxes, and provide a PPC interface for
applications to call. Should a server be compromised, that compromise is
confined to the server, its sandbox protects the rest of the
system. This is in stark contrast to the monolithic case, where a
compromise of an OS service compromises the complete system.
This effect can be quantified: Our recent study shows that of the
known Linux compromises classified as \emph{critical}, i.e.\ most
severe, 29\% would be fully eliminated by a microkernel design, and
another 55\% would be mitigated enough to no longer qualify as
critical \citep{Biggs_LH_18}.
\SSect{seL4 Is a microkernel, not an OS}
\begin{figure}[ht]
\centering
\includegraphics{l4family}
\caption{L4 microkernel family tree.}
\label{f:l4family}
\end{figure}
seL4 is a microkernel, and designed for generality while
minimising the TCB. It is a member of the L4
microkernel family, which goes back to the mid-'90s;
\autoref{f:l4family} shows seL4's provenance. It was developed by our group at UNSW/NICTA,
these days known as Trustworthy Systems (TS). At the time we
had 15 years of experience in developing
high-performance microkernels, and a track-record of
real-world deployments: Our \emph{OKL4 Microkernel} shipped on
billions of Qualcomm cellular modem chips, and our \emph{L4-embedded} kernel from the
mid-Noughties runs on the secure enclave of all recent
iOS devices (iPhones etc).
Being a microkernel, seL4 contains none of the usual OS
services; such services are provided by programs running in user
mode. Besides the great advantages elaborated above,
there are downsides to the microkernel design: These components
must come from somewhere. Some can be ported from open-source OSes, such as
FreeBSD or Linux, or they can be written from scratch. But in any
case, this is significant work.
To scale up we need the help of the community, and the seL4
Foundation is the key mechanism for enabling the community to
cooperate and develop or port such services for seL4-based
systems. The most important ones are device drivers, network
protocol stacks, and file
systems. \href{https://docs.sel4.systems/projects/available-user-components.html}{We
have a fair number of these}, but much more is needed.
Even compared to other microkernels, seL4's API is very low-level,
with only the minimum abstraction as required to securely manage the
hardware. As such, building systems on seL4 is particularly
difficult. A good way to look at this is to think of seL4 as the
``assembly language of operating systems'': very primitive.
No-one in their right mind would write a complex system, say a
database or web server, in assembly language, we use higher-level
programming languages for that. These simplify the task by providing
higher-level constructs and abstractions, at the expense of losing
some of the power the hardware provides. In almost all cases, that
is a winning trade-off.
Similarly, one should not try to build a complex system directly on
seL4, but should use a higher-level framework that provides more
appropriate abstractions, at the expense of introducing some policy
and taming the power of seL4. Specifically, such a framework should allows developers
to focus on the code that implements the services, ignore hardware
complexities, and automate
much of the system integration.
There are presently three main
component frameworks for seL4, all open source: The Microkit, CAmkES
and Genode.
The \href{https://trustworthy.systems/projects/microkit/}{seL4 Microkit} reduces
the complex seL4 API to literally a handful of simple abstractions,
designed around components called \emph{protection domains}. It also
provides a \emph{software development kit} (SDK) that makes it easy
to build and integrate separately compiled modules with a kernel
binary to produce a bootable image. The Microkit's simplicity is
achieved by requiring the system architecture to be static, meaning
the set of modules and their communications is defined at system
configuration time -- a model that seems to match the requirements
of most (if not all) embedded systems, including complex
cyberphysical systems such as cars and aircraft.
\href{https://trustworthy.systems/projects/OLD/camkes/}{CAmkES},
a predecessor of the Microkit, is also a component
framework for a statically-architected system. Its abstractions are
higher level yet more complex than those of the Microkit. It also
lacks an SDK, resulting in a much more painful build process that
forces use of the kernel's complex build system. CAmkES also
introduces significant overheads.
\href{https://genode.org/}{Genode} is in many ways a more powerful
and general framework, that supports multiple microkernels and
already comes with a wealth of services and device drivers,
especially for x86 platforms, and does not enforce a static system
architecture. It is arguably more convenient
to work with than CAmkES, and is certainly the way to get a
complex system up quickly. However, Genode has drawbacks:
\begin{inparaenum}
\item As it supports multiple microkernels, not all as powerful as
seL4, Genode is based on the least common denominator. In
particular, it cannot use all of seL4's security and safety
features.
\item It has no assurance story. More on this in \autoref{s:microkit}.
% \item Genode's seL4 support does not seem to be actively maintained.
\end{inparaenum}
For these reasons, the Microkit is the recommended framework for
building systems on seL4, at least as long as the static
architecture works. \autoref{s:microkit} presents it in more detail.
\SSect{seL4 is also a hypervisor}\label{s:hyp}
seL4 is a microkernel, but it is also a hypervisor: It is possible
to run virtual machines on seL4, and inside the virtual machine (VM)
a mainstream OS, such as Linux.
\begin{figure}[ht]
\centering
\includegraphics[width=0.9\textwidth]{hypervisor}
\caption[VM-provided services]{Using virtualisation to integrate
native OS services with Linux-provided services.}
\label{f:vms}
\end{figure}
This enables an alternative way of provisioning system services, by
having a Linux VM provide them. Such a setup is shown in
\autoref{f:vms}, which shows how some services are borrowed from
multiple Linux instances running as guest OSes in separate VMs.
In this example, we provide two system services: networking and
storage. Networking is provided by a native protocol stack
running directly on seL4, lwIP or PicoTCP are frequently used
stacks. Instead of porting a network driver, we borrow one from
Linux, by running a VM with a stripped-down Linux guest that has
little more than the NIC driver. The protocol stack communicates
with Linux via an seL4-provided channel, and the application
similarly obtains network services by communicating with the
protocol stack. Note that in the setup shown in the figure, the
application has no channel to the NIC-driver VM, and thus cannot
communicate with it directly, only via the NW stack; this is enabled
by seL4's capability-based protection (see \autoref{s:caps}).
A similar setup is shown for the storage service; this time the
file system is a Linux one running in a VM, while the storage
driver is native. Again, communication between the components is
limited to channels explicitly provided. In particular, the app
cannot talk to the storage driver (except through the file system),
and the two Linux systems cannot communicate with each other.
\begin{figure}[b]
\centering
\includegraphics{vmm}
\caption{seL4 virtualisation support with usermode VMMs.}
\label{f:vmm}
\end{figure}
\begin{Chilli}
When used as a hypervisor, seL4 runs in the appropriate hypervisor
mode (EL2 on Arm, Root Ring-0 on x86, HS on RISC-V), which is a
higher privilege level than the guest operating system. Just as
when running as the OS kernel, it only does the minimum work
that has to be performed in the privileged (hypervisor) mode and leaves everything
else to user mode.
Specifically this means that seL4 performs \emph{world switches},
meaning it switches virtual machine state when a VM's execution
time is up, or VMs must be switched for some other reason. It also
catches virtualisation exceptions (``VM exits'' in Intel lingo)
and forwards them to a user-level handler, called the
\emph{virtual machine monitor} (VMM). The VMM is then responsible
for performing any emulation operations needed.
Each VM has its private copy of the VMM, isolated from the guest
OS as well as from other VMs, as shown in \autoref{f:vmm}. This
means that the VMM cannot break isolation, and is therefore not
more trusted than the guest OS itself. In particular, this means
that there is no need to verify the VMM, as that would not add
real assurance as long as the guest OS, typically Linux, is not
verified.
\end{Chilli}
\SSect{seL4 is not seLinux}
Many people confuse seL4 with seLinux (probably because seL4 might be
mistaken as a shorthand for the 4\(^\textrm{th}\) version of seLinux).
Fact is that seL4 has nothing whatsoever to do with seLinux, other
than both being open source. They share no code nor abstractions. seLinux is
not a microkernel, it is a security policy framework built into Linux. While in some ways
more secure than standard Linux, seLinux suffers from the same
problem as standard Linux:
a huge TCB, and correspondingly huge attack surface. In other words,
seLinux is an add-on to a fundamentally insecure operating system
and thus remains fundamentally insecure. In contrast, seL4 provides
bullet-proof isolation from the ground up.
In short, seLinux is not suitable for truly security-critical uses,
while seL4 is designed for them.
\Sect{seL4's Verification Story}\label{s:proofs}
In 2009, seL4 became the world's first OS kernel with a machine-checked
functional correctness proof at the source-code level. This proof was 200,000
lines of proof script at the time, one of the largest ever (we think it was
the second largest then). It showed that a functionally correct OS kernel is
possible, something that until then had been considered infeasible.
Since then we have extended the scope of the verification to higher
level properties, \autoref{f:proofs} shows the chain of proofs,
which are explained below. Importantly, we maintained the proof with the
ongoing evolution of the kernel: Commits to the mainline kernel
source are only allowed if they do not break proofs, otherwise the proofs
are updated as well. This \emph{proof engineering} is also a
novelty. Our seL4 proofs constitute by
far the largest proof base that is actively maintained. The set of proofs has by
now grown to well over a million lines, most of this manually
written and then machine checked.
\begin{figure}[bh]
\centering
\includegraphics[width=0.6\textwidth]{proofs}
\caption{seL4's proof chain.}
\label{f:proofs}
\end{figure}
\SSect{Correctness and security enforcement}
\SSSect{Functional correctness}
The core of seL4's verification is the functional correctness proof, which says that the
C implementation is free of implementation defects.
More precisely, there is a formal
specification of the kernel's functionality, expressed in a
mathematical language called \emph{higher-order logic} (HOL). This
is represented by the box labelled \emph{abstract model} in the
figure. The functional correctness proof then says that the C
implementation is a \emph{refinement} of the abstract model, meaning
the possible behaviours of the C code are a subset of those allowed by the
abstract model.
\begin{Chilli}
This informal description glosses over a lot of detail. Here is some
of it in case you wonder.
C is not a formal language; in order to allow reasoning about a C
program in the theorem prover (we use Isabelle/HOL), it has to be
transformed into mathematical logic (HOL). This is done by a C
parser written in Isabelle. The parser defines the semantics of
the C program, and gives it meaning in HOL according to this semantics.
It is this formalisation
which we prove to be a refinement of the mathematical (abstract) model.
Note that C does not have an official mathematical semantics, and
parts of the C language are notoriously subtle and not necessarily
that well defined. We solve this by restricting our use of C to a
well-defined subset of the language, for which we have an
unambiguous semantics. However, this does not guarantee that our
assumed semantics for that subset is the same as the
compiler's. More on that below.
\end{Chilli}
The proof means that everything we want to
know about the kernel's behaviour (other than timing) is expressed
by the abstract spec, and the kernel cannot behave in ways that are
not allowed by the spec. Among others, this rules out the usual
attacks against operating systems, such as stack smashing,
null-pointer dereference, any code injection or control-flow highjacking etc.
\SSSect{Translation validation}
Having a bug-free C implementation of the kernel is great, but still
leaves us at the mercy of the C compiler. Those compilers (we use
GCC) are themselves large, complex programs that have
bugs. So we could have a bug-free kernel that gets compiled into a
buggy binary.
\begin{figure}[th]
\centering
\includegraphics[width=0.85\textwidth]{binary}
\caption{Translation validation proof chain.}
\label{f:binary}
\end{figure}
In the security-critical space, compiler bugs are not the only
problem. A compiler could be outright malicious, containing a Trojan
that automatically builds in a back door when compiling the OS. The
Trojan can be extended to automatically add itself when compiling
the compiler, making it almost impossible to detect, even if the
compiler is open-source! Ken Thompson explained this attack
in his Turing Award lecture~\citep{Thompson_84}.
To protect against defective or malicious compilers, we additionally
verify the executable binary that is produced by the compiler and
linker. Specifically, we prove that the binary is a correct translation of the
(proved correct) C code, and thus that the binary refines the
abstract spec.
\begin{Chilli}
Unlike the verification of the C code, this proof is not done
manually but by an automatic tool chain. It consists of several
phases, as shown in \autoref{f:binary}. A formal model of the processor's instruction set
architecture (ISA) formalises the binary in the theorem prover; we
use an L3 formalisation of the RISC-V ISA, as well as the
extensively tested L3 Arm ISA formalisation of \citet{Fox_Myreen_10}.
Then a disassembler, written in the HOL4 theorem prover,
translates this low-level representation into a higher-level
representation in a graph language that basically represents
control flow. This transformation is provably correct.
The formalised C program is translated into the same graph
language, through provably correct transformations in the Isabelle/HOL theorem
prover. We then have two programs, in the same representation,
which we need to show equivalent. This is a bit
tricky, as compilers apply a number of heuristic-driven
transformations to optimise the code. We apply a number of such
transformations through rewrite rules on the graph-language
representation of the C program (still in the theorem prover, and
thus provably correct).
In the end we then have two programs that are quite similar but
not the same, and we need to prove that they have the same
semantics. In theory this is equivalent to the halting problem
and as such unsolvable. In practice, what the compiler does is
deterministic enough to make the problem tractable. We do this by
throwing the programs, in small chunks, at multiple \href{https://en.wikipedia.org/wiki/Satisfiability_modulo_theories}{SMT solvers}. If one of these can
prove that all the corresponding pieces have the same semantics,
then we know that the two programs are equivalent.
Note also that the C program that is proved to refine the abstract
spec, and the C program that we prove to be equivalent to the
binary, are the same Isabelle/HOL formalisations. This means that
our assumptions on C semantics drop out of the assumptions made by
the proofs. Altogether, the proofs
not only show that the compiler did not introduce bugs, but also
that its semantics for the C subset we use are the same as ours.
\end{Chilli}
\SSSect{Security properties}
\autoref{f:proofs} also shows proofs between the abstract spec and
the high-level security properties \emph{confidentiality},
\emph{integrity} and \emph{availability} (these are commonly
dubbed the \emph{CIA properties}). These state that the
abstract spec is actually useful for security: They prove that in a
correctly configured system, the kernel will enforce these
properties.
Specifically, seL4 enforces
\begin{description}
\item[confidentiality:] seL4 will not allow an entity to read (or
otherwise infer) data without having been explicitly given read access
to the data;
\item[integrity:] seL4 will not allow an entity to modify data
without having been explicitly given write access to the data;
\item[availability:] seL4 will not allow an entity to prevent another
entity's authorised use of resources.
\end{description}
\begin{Chilli}
These proofs presently do not capture properties associated with
time. Our confidentiality proofs rule out \emph{covert storage
channels} but presently not \emph{covert timing channels}, which
are used by such attacks as Spectre. Preventing timing channels
is something we are working on
\citep{Heiser_KM_19}. Similarly, the integrity and availability
proofs presently do not cover timeliness, but our new MCS model
\citep{Lyons_MAH_18} is designed to cover those aspects (see \autoref{s:mcs}).
\end{Chilli}
\SSSect{Proof assumptions}
All reasoning about correctness is based on assumptions,
whether the reasoning is formal, as with seL4, or informal, when
someone thinks about why their program might be ``correct''. Every
program executes in some context, and its correct behaviour
inevitably depends on some assumptions about this context.
One of the advantages of machine-checked formal reasoning is that it
forces people to make those assumptions explicit. It is not possible
to make unstated assumptions, the proofs will just not succeed if
they depend on assumptions that are not clearly stated. In that
sense, formal reasoning protects against forgetting assumptions, or
not being clear about them; that in itself is a significant benefit
of verification.
The verification of seL4 makes three assumptions:
\begin{description}
\item[Hardware behaves as expected.] This should be obvious. The
kernel is at the mercy of the underlying hardware, and if the
hardware is buggy (or worse, has Trojans), then all bets are off,
whether you are running verified seL4 or any unverified
OS. Verifying hardware is outside the scope of seL4 (and
the competency of TS); other people are working on that.
\item[The spec matches expectations.] This is a difficult one,
because one can never be sure that a formal specification means
what we think it should mean. Of course, the same problem exists
if there is \emph{no} formal specification: if the spec is informal or
non-existent, then it is obviously impossible to precisely reason
about correct behaviour.
One can reduce this risk by proving properties about the spec, as
we have done with our security proofs, which show that seL4 is
able to enforce certain security properties. That then shifts the
problem to the specification of those properties. They are much
simpler than the kernel spec, reducing the risk of misunderstanding.
But in the end, there is always a gap between the world of mathematics and
the physical world, and no end of reasoning (formal or informal)
can remove this completely. The advantage of formal reasoning is
that you know exactly what this gap is.
\item[The theorem prover is correct.] This sounds like a serious
problem, given that theorem provers are themselves large and
complex programs. However, in reality this is the least concerning
of the three assumptions. The reason is that the Isabelle/HOL
theorem prover has a small core (of a few 10\,kSLOC) that checks
all proofs against the axioms of the logic. And this core has
checked many proofs small and large from a wide field of formal
reasoning, so the chance of it containing a correctness-critical
bug is extremely small.
\end{description}
\SSSect{Proof status and coverage}
seL4 has been or is being verified for multiple architectures: Arm,
x86 and RISC-V. Some of these are more complete than others, but the
missing bits are generally worked on or waiting for funding. Please
refer to the
\href{https://docs.sel4.systems/projects/sel4/status.html}{seL4
project status page} for details.
\SSect{The seL4 Microkit}\label{s:microkit}
The Microkit is a thin abstraction layer over seL4 that hides much
of seL4's complexities at the cost of imposing the restriction of a
static system architecture: At system configuration time, all
components and their authorised communication channels are known.
\begin{figure}[b]
\centering
\includegraphics[width=0.6\textwidth]{microkit}
\caption{Microkit abstractions.}
\label{f:microkit}
\end{figure}
\autoref{f:microkit} shows these abstractions (yes, that's all!) The
\emph{protection domain} (PD) is the process abstraction: it
represents a program in execution in its own, isolated address
space. Two PDs can be connected by a \emph{communication channel}
and may also share \emph{memory regions} that are mapped in one or
more PDs' address spaces. A VM is similar to a PD in that it is an
isolated component, but has the additional capability of running a guest
OS.
\begin{Chilli}
Microkit further simplifies programming by imposing an
event-handling model. Specifically, each PD has two or three entry
points: \code{init()}, \code{notified()} and, optionally,
\code{protected()}. These are provided by the programmer and invoked
by the framework on particular events: \code{init} is called exactly
once at system initialisation time and before any of the other
entry points are called. \code{notified()} is invoked if the PD at the
other end of the channel calls the \code{notify()} API on the
channel, and \code{protected()} is invoked if the PD at the other
end of a channel calls the \code{ppcall()} API.
PDs have scheduling parameters, which include a priority.
\code{notify()} calls are asynchronous: they never block, and the
exact time at which the receiver PD's \code{notified()} function is invoked depends
on priorities and whether the two PDs are executing on the same core
or not. These calls pass no data to the receiver other than the
identity of the notifier, and as such behave similarly to
interrupts. In contrast, a \code{ppcall()} behaves like a normal
function call, except that the code it executes happens to be in a
different PD. In particular, \code{ppcall()} is synchronous, i.e.\ it blocks
the caller until the callee returns from its \code{protected()}
handler. It can also pass arguments and return a
result value. To prevent deadlock, a \code{ppcall()} is only
allowed from lower to higher priority.
PDs providing a \code{protected()} handler are referred to as
server, and the PD invoking \code{ppcall()} is called a
client. Servers execute on the client's \emph{scheduling context}
(see \autoref{s:mcs}) and core, meaning that their usage of the CPU
resource is properly accounted to the client. (This assumes that
the server is configured as ``passive'', which is the recommended
approach.)
The Microkit programming model, if used properly, makes PDs
location-transparent: A PD does not know whether the PD at the
other end of a channel is running on the same or a different
core. This greatly simplifies building multicore systems.
\end{Chilli}
The (static) architecture of a Microkit system is specified in a
\emph{system description file} (SDF), which specifies all the PDs of
the system with their scheduling parameters (and a few more
attributes, such as whether the PD manages a VM or has the right to
handle interrupts), the communication channels, and memory regions
and how they are mapped into PDs.
The programmer writes each PD as a separate program, and links it
against \code{libmicrokit} to produce an executable (ELF) file for
each PD. The Microkit comes with an SDK, which mangles the
programmer-provided ELF file with a kernel binary as specified by
the SDF. The programmer can use the build tool of their choice
(which can be a simple \code{Makefile}) for building the system.
The Microkit promise to the system designer is that what is
specified in the SDF (and visualised as in \autoref{f:microkit}) is
a faithful representation of the possible interactions. In
particular, it promises that no interactions are possible beyond
those defined in the SDF.
\begin{figure}[hb]
\centering
\includegraphics[width=\textwidth]{capdl}
\caption[Verified architecture mapping and system generation.]
{Verified architecture mapping and system generation
(note that not all verification steps are of full strength
yet). Green boxes are generated provably correct.}
\label{f:capdl}
\end{figure}
Of course, this promise depends on enforcement by seL4, and the SDF
representation must be mapped onto low-level seL4 objects and access
rights to them. This is what the Microkit machinery achieves, and is
shown in \autoref{f:capdl}.
In the figure, the architecture (i.e.\ what is described in the SDF)
is shown at the top. This is a fairly simple system, consisting of
four native PDs and one PD that houses a virtual
machine hosting a Linux guest with a couple of networking
drivers. The Linux VM is only connected to other PDs via the
crypto PD, which ensures that it can only access encrypted
links and cannot leak data.
Even this simple system maps to hundreds if not thousands of seL4
objects, an indication of the complexity reduction provided by the
Microkit abstraction.
For the seL4-level description we have another formal language,
called \emph{CapDL} (capability distribution language). The system designer
never needs to deal with CapDL, it is a purely internal
representation. The Microkit framework contains a compiler which
automatically translates the SDF into CapDL, indicated by the box
arrow pointing left-down. The box in the left of the figure gives a
(simplified) representation of the seL4 objects described in
CapDL. (It is actually a simplified representation of a much simpler
system, basically just two of the PDs at the top of
\autoref{f:microkit} and the channel between them.)
The CapDL spec is a precise representation of access rights in the
system, and it is what seL4 enforces. Which means that once the
system gets into the state described by the CapDL spec, it is
guaranteed to behave as described by the SDF spec, and
therefore the architecture-level description is sufficient for
further reasoning about security properties.
So we need assurance that the system boots up into the state
described by the CapDL spec. We achieve this with a second
automated step: We generate from CapDL the startup code that, as
soon as seL4 itself has booted, takes control and generates all the
seL4 objects referenced by the spec, including the ones representing
PDs, and distributes the \emph{capabilities} (see
\autoref{s:caps}) that grant access to those objects according to
the spec. At the end of the execution of this init code, the system is
provably in the state described by the CapDL spec, and thus in the
state represented by the SDF spec.
\code{libmicrokit}, which maps the Microkit abstractions to the seL4 API,
was also verified for correct implementation.
\textbf{Note:} At the time of writing, the proofs about Microkit and
CapDL are not yet complete and based on an older version of the
Microkit. Completion is planned for the future.
\begin{Chilli}
Note also that none of the verification work mentioned deals with
information leakage through timing channels (yet). This is a
major unsolved research problem, but we're at the forefront of
solving it.
\end{Chilli}
\Sect{About Capabilities}\label{s:caps}
We encountered capabilities in \autoref{s:intro}, noting that they are
access tokens. We will now look at the concept in more detail.
\SSect{What are capabilities?}
\begin{figure}[ht]
\centering
\includegraphics[width=0.5\textwidth]{cap}
\caption[Capabilities are keys to objects]{A capability is a key
that conveys specific rights to a particular object.}
\label{f:cap}
\end{figure}
As shown in \autoref{f:cap}, a capability is an object reference; in
that sense it is similar to a pointer (and implementation of
capabilities are often referred as ``fat pointers''). They are
immutable pointers, in the sense that a capability will always
reference the \emph{same} object, so each capability uniquely
specifies a particular object.
In addition to pointers, a capability also encodes access rights, in
fact, the capability is an encapsulation of an object reference and
the rights it conveys to that object. In a capability-based system,
such as seL4, invoking a capability is the one and only way of
performing an operation on a system object. In fact, a system call
in seL4 is a capability invocation, with arguments specifying what
operation to perform on the object referenced by the invoked
capability. The kernel will then check whether the capability
authorises the requested operation, and immediately abort the
operation if it is not authorised.
For example, an operation may be to call a function in a
component. The object reference embedded in the capability then