Re: OT: Open letter to the Linux World

From: Alexander Holler
Date: Sat Sep 06 2014 - 16:02:48 EST


Am 05.09.2014 08:31, schrieb Alexander Holler:
Am 04.09.2014 21:18, schrieb Rob Landley:

What's actually wrong with C++ at a language design level.

Short version:

OMG.

It's better than C. In almost every aspect. Stop. Nothing else. Of
course, if you want to write something like systemd in Python, Perl,
Pascal, Modula or Erlang, feel free to so. And if you want more security
bugs, feel free to still use C for string handling instead of
std::string, Or still write your sorted list for every structure (or
just don't and go the slow way, because you don't find the time to do it
right in C). And ...

You don't have to understand how templates do work to use e.g.
std::string. Other people do hard stuff for you. So don't panic.


I've brought up the critics about using C in a critical and very security sensitive piece of software in userland, so I've decided a bit more explanations might make sense.

First, as you don't seem to have noticed or you don't know or you ignore the difference, let me repeat that this thread is about a piece of SW which runs in userland. So, please, keep away with any comments from Linus where he talks about kernelspace. I'm pretty sure he knows the difference.

Now let me bring up a very small piece of code which you can find in a similiar fashion in almost every piece of software which gets in contact with strings. And not just in one place or function, but in dozens or even hundred of places (inline, not in functions) in one project.

First in C++:

void foo_bar(const std::string& foo, const std::string& bar, std::string& foobar)
{
foobar = foo + bar;
}

For those which don't know C++, this concatenates the two strings named foo and bar and puts the result into foobar.

Now an example how you would have to do that in C:

char *foo_bar(const char *foo, const char *bar)
{
char *foobar = malloc(strlen(foo) + strlen(bar));

strcpy(foobar, foo);
strcat(foobar, bar);

return foobar;
}

Do you see the difference and spot all the problems?

First I've though about not posting the answer to see the response, but that would just have ended up with a lot of people calling me a fool and/or assuming I can't write proper C. And it bears the problem that some inexperienced people might copy and paste and use it.

So at first: THE ABOVE EXAMPLE IN C IS BROKEN.

The very first problem is that foobar is allocated with the wrong size, because it doesn't take care of the terminating null byte. A very common problem already found at uncountable places.

But there are several more problems:

- What happens if foo or bar isn't terminated with a null byte?

- What happens if malloc fails?

- Who is the owner of foo, bar and/or foobar? Does the caller still owns foo and bar afterwards? Will the caller own foobar? (That means who is repsonsible to free foo, bar and foobar if they aren't used anymore).

So now we extend the above C example:

char *foo_bar(const char *foo, const char *bar)
{
char *foobar;

if (!foo || !bar)
return NULL;

foobar = malloc(strlen(foo) + strlen(bar) + 1);

if (!foobar)
return NULL;

strcpy(foobar, foo);
strcat(foobar, bar);

return foobar;
}

This has still some problems. First, the caller has to check if foo_bar() hasn't returned NULL. A very common bug already found in uncountable places too.

Next, there is still the unsolvable problem about what happens if foo or bar isn't terminated with a null byte (in other words they aren't C strings).
So you have to check all callers up to the source of foo and bar to be sure the program doesn't crash in the possible far far away place called foo_bar().

And still no comment about ownership. That means someone who just looks at the prototype or sees a call of foo_bar() somewhere has no idea about the ownership of foo, bar and the returned foobar without a comment.

So just this very simple functionality about string handling in C already contains several still open questions and is 17 lines long which have to be reviewed very carefull (e.g. to not miss the off-by-one bug).
Compare this with the 4 lines in C++ which are almost impossible to do or to use wrong.

And, again, this thread is about a piece of software which runs with process ID 1, wants to control the whole system and owns all permissions to modify the system in almost every possible way. It doesn't run as some user with restricted permissions or in chroot or something similar. Some parts might do, but for sure not all (read again the above "far far away").

And now some stats. I've just checked out systemd:

git grep -E "strcat|strncat|strcpy|strncpy|strlen" | wc -l
570

git grep -E "strcat|strncat|strcpy|strncpy|memcpy|strlen" | wc -l
850

Ok, not every of those places might be part of pid 1. And several places are trivial calls like strlen("ATTR"), but it gives an idea about how many places do exist in systemd which might contain a problem wich isn't trivial to spot.

And regardless how clever and experienced these people are which are writing this piece of software, everyone is prone to do e.g. such an off-by-one bug.
Maybe he writes the piece of code after having worked 12 hours, maybe he got interrupted while writing the code and continued it a day later, maybe it's full moon or maybe his last meal wasn't like it should have been.

Whatever.

This means, every piece of code in that piece of software has to be reviewed multiple times (reviewers aren't perfect too), and, as long as the software changes, every piece which changes has to be reviewed multiple times again.

I could continue with examples for lists, sets and similiar data structures, which have to be inventent again and again and again in C, whereas in C++ people can reuse some code which is already in use by many, many people in many, many other projects.

And to come to your argument about how simple everything in C is. Just look at the macro for container_of(). I wouldn't say it's such simple that everyone understands what it does. And it's just part of the Linux kernel, that means limited documentation and many people never heard of it before, compared with stuff which can be found standard libraries.

And, again, this is not about the kernelspace, it isn't the Linux kernel where Linus has managed it to organize an army of people which do look at every line again and again (and still do sometimes miss a bug).
Most software projects don't have that many resources (human or not) available as the Linux kernel. In fact it's an absolute exception.

So you just don't want to use error prone C in new and non-trivial projects (if not really necessary) which are a major problem if something fails in the code.

Doing so just means nothing has be learned from the (of corse relatively short) history of software development.

Alexander Holler

PS: Please don't try to tell me that even the above C++ example ends up in some similar code as the C code. std::string is used by even more people than which do review the Linux kernel code. Besides that it was designed and reviewed by clever people too. And, just to repeat it again, we are talking about userspace, not kernelspace.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/