Re: [PATCH 00/45] C++: Convert the kernel to C++
From: H. Peter Anvin
Date: Tue Jan 09 2024 - 14:58:15 EST
Hi all, I'm going to stir the hornet's nest and make what has become the
ultimate sacrilege.
Andrew Pinski recently made aware of this thread. I realize it was
released on April 1, 2018, and either was a joke or might have been
taken as one. However, I think there is validity to it, and I'm going to
try to motivate my opinion here.
Both C and C++ has had a lot of development since 1999, and C++ has in
fact, in my personal opinion, finally "grown up" to be a better C for
the kind of embedded programming that an OS kernel epitomizes. I'm
saying that as the author of a very large number of macro and inline
assembly hacks in the kernel.
What really makes me say that is that a lot of things we have recently
asked for gcc-specific extensions are in fact relatively easy to
implement in standard C++ and, in many cases, allows for infrastructure
improvement *without* global code changes (see below.)
C++14 is in my option the "minimum" version that has reasonable
metaprogramming support has most of it without the type hell of earlier
versions (C++11 had most of it, but C++14 fills in some key missing pieces).
However C++20 is really the main game changer in my opinion; although
earlier versions could play a lot of SFINAE hacks they also gave
absolutely useless barf as error messages. C++20 adds concepts, which
makes it possible to actually get reasonable errors.
We do a lot of metaprogramming in the Linux kernel, implemented with
some often truly hideous macro hacks. These are also virtually
impossible to debug. Consider the uaccess.h type hacks, some of which I
designed and wrote. In C++, the various casts and case statements can be
unwound into separate template instances, and with some cleverness can
also strictly enforce things like user space vs kernel space pointers as
well as already-verified versus unverified user space pointers, not to
mention easily handle the case of 32-bit user space types in a 64-bit
kernel and make endianness conversion enforceable.
Now, "why not Rust"? First of all, Rust uses a different (often, in my
opinion, gratuitously so) syntax, and not only would all the kernel
developers need to become intimately familiar to the level of getting
the same kind of "feel" as we have for C, but converting C code to Rust
isn't something that can be done piecemeal, whereas with some cleanups
the existing C code can be compiled as C++.
However, I find that I disagree with some of David's conclusions; in
fact I believe David is unnecessarily *pessimistic* at least given
modern C++.
Note that no one in their sane mind would expect to use all the features
of C++. Just like we have "kernel C" (currently a subset of C11 with a
relatively large set of allowed compiler-specific extensions) we would
have "kernel C++", which I would suggest to be a strictly defined subset
of C++20 combined with a similar set of compiler extensions.) I realize
C++20 compiler support is still very new for obvious reasons, so at
least some of this is forward looking.
So, notes on this specific subset based on David's comments.
On 4/1/18 13:40, David Howells wrote:
Here are a series of patches to start converting the kernel to C++. It
requires g++ v8.
What rocks:
(1) Inline template functions, which makes implementation of things like
cmpxchg() and get_user() much cleaner.
Much, much cleaner indeed. But it also allows for introducing things
like inline patching of immediates *without* having to change literally
every instance of a variable.
I wrote, in fact, such a patchset. It probably included the most awful
assembly hacks I have ever done, in order to implement the mechanics,
but what *really* made me give up on it was the fact that every site
where a patchable variable is invoked would have to be changed from, say:
foo = bar + some_init_offset;
.. to ...
foo = imm_add(bar, some_init_offset);
(2) Inline overloaded functions, which makes implementation of things like
static_branch_likely() cleaner.
Basically a subset of the above (it just means that for a specific set
of very common cases it isn't necessary to go all the way to using
templates, which makes the syntax nicer.)
(3) Class inheritance. For instance, all those inode wrappers that require
the base inode struct to be included and that has to be accessed with
something like:
inode->vfs_inode.i_mtime
when you could instead do:
inode->i_mtime
This is nice, but it is fundamentally syntactic sugar. Similar things
can be done with anonymous structures, *except* that C doesn't allow
another structure to be anonymously included; you have to have an
entirely new "struct" statement defining all the fields. Welcome to
macro hell.
What I would disallow:
(1) new and delete. There's no way to pass GFP_* flags in.
Yes, there is.
void * operator new (size_t count, gfp_flags_t flags);
void operator delete(void *ptr, ...whatever kfree/vfree/etc need, or a
suitable flag);
(2) Constructors and destructors. Nests of implicit code makes the code less
obvious, and the replacement of static initialisation with constructor
calls would make the code size larger.
Yes and no. It also makes it *way* easier to convert to and from using
dedicated slabs; we already use semi-initialized slabs for some kinds of
objects, but it requires new code to make use of.
We already *do* use constructors and *especially* destructors for a lot
of objects, we just call them out.
Note that modern C++ also has the ability to construct and destruct
objects in-place, so allocation and construction/destruction aren't
necessarily related.
There is no reason you can't do static initialization where possible;
even constructors can be evaluated at compile time if they are constexpr.
Constructors (and destructors, for modules) in conjunction with gcc's
init_priority() extension is also a nice replacement for linker hack
tables to invoke intializer functions.
(3) Exceptions and RTTI. RTTI would bulk the kernel up too much and
exception handling is limited without it, and since destructors are not
allowed, you still have to manually clean up after an error.
Agreed here, especially since on many platforms exception handling
relies on DWARF unwind information.
(4) Operator overloading (except in special cases).
See the example of inline patching above. But yes, overloading and
*especially* operator overloading should be used only with care; this is
pretty much true across the board.
(5) Function overloading (except in special inline cases).
I think we might find non-inline cases where it matters, too.
(6) STL (though some type trait bits are needed to replace __builtins that
don't exist in g++).
Just like there are parts of the C library which is really about the
compiler and not part of the library. <type_traits> is part of that for C++.
(7) 'class', 'private', 'namespace'.
(8) 'virtual'. Don't want virtual base classes, though virtual function
tables might make operations tables more efficient.
Operations tables *are* virtual classes. virtual base classes make sense
in a lot of cases, and we de facto use them already.
However, Linux also does conversion of polymorphic objects from one type
to another -- that is for example how device nodes are implemented.
Using this with C++ polymorphism without RTTI does require some
compiler-specific hacks, unfortunately.
Issues:
(1) Need spaces inserting between strings and symbols.
I have to admit I don't really grok this?
(2) Direct assignment of pointers to/from void* isn't allowed by C++, though
g++ grudgingly permits it with -fpermissive. I would imagine that a
compiler option could easily be added to hide the error entirely.
Seriously. It should also enforce that it should be a trivial type.
Unfortunately it doesn't look like there is a way to create user-defined
implicit conversions from one pointer to another (via a helper class),
which otherwise would have had some other nice applications.
(3) Need gcc v8+ to statically initialise an object of any struct that's not
really simple (e.g. if it's got an embedded union).
Worst case: constexpr constructor.
(4) Symbol length. Really need to extern "C" everything to reduce the size
of the symbols stored in the kernel image. This shouldn't be a problem
if out-of-line function overloading isn't permitted.
This really would lose arguably the absolutely biggest advantage of C++:
type-safe linkage. This is the one reason why Linus actually tried to
use C++ in one single version of the kernel in the early days (0.99.14,
if I remember correctly.) At that time, g++ was nowhere near mature
enough, and it got dropped right away.
So far, it gets as far as compiling init/main.c to a .o file.
;)