Re: [RFC] Mitigating unexpected arithmetic overflow
From: Kees Cook
Date: Fri May 17 2024 - 17:15:11 EST
On Thu, May 16, 2024 at 02:51:34PM -0600, Theodore Ts'o wrote:
> On Thu, May 16, 2024 at 12:48:47PM -0700, Justin Stitt wrote:
> >
> > It is incredibly important that the exact opposite approach is taken;
> > we need to be annotating (or adding type qualifiers to) the _expected_
> > overflow cases. The omniscience required to go and properly annotate
> > all the spots that will cause problems would suggest that we should
> > just fix the bug outright. If only it was that easy.
>
> It certainly isn't easy, yes. But the problem is when you dump a huge
> amount of work and pain onto kernel developers, when they haven't
> signed up for it, when they don't necessarily have the time to do all
> of the work themselves, and when their corporate overlords won't given
> them the headcount to handle unfunded mandates which folks who are
> looking for a bright new wonderful future --- don't be surprised if
> kernel developers push back hard.
I never claimed this to be some kind of "everyone has to stop and make
these changes" event. I even talked about how it would be gradual and
not break existing code (all "WARN and continue anyway"). This is what
we've been doing with the array bounds work. Lots of people are helping
with that, but a lot of those patches have been from Gustavo and me; we
tried to keep the burden away from other developers as much as we could.
> One of the big problems that we've seen with many of these security
> initiatives is that the teams create these unfunded mandates get their
> performance reviews based on how many "bug reports" that they file,
> regardless of whether they are real problems or not. This has been a
> problem with syzkaller, and with clusterfuzz. Let's not make this
> problem worse with new and fancy sanitizers, please.
Are you talking about *my* security initiatives? I've been doing this
kind work in the kernel for 10 years, and at no time has "bug report
count" been a metric. In fact, the whole goal is making it _impossible_
to have a bug. (e.g. VLA removal, switch fallthrough, etc). My drive has
been to kill entire classes of bugs.
The use of sanitizers isn't to just bolster fuzzers (though they're
helpful for finding false positives). It's to use the sanitizers _in
production_, to stop flaws from being exploitable. Android has enabled
the array bounds sanitizer in trap mode for at least 2 years now. We
want the kernel to be self-protective; pro-actively catching flaws.
> Unfortunately, I don't get funding from my employer to clear these
> kinds of reports, so when I do the work, it happens on the weekends or
> late at night, on my own time, which is probably why I am so grumpy
As for the work itself, like I mentioned before, most of these get fixed
my Gustavo, me, and now Justin too. And many other folks step up to help
out, which is great. Some get complex and other maintainers get involved
too, but it's slow and steady. We're trying to reduce the frequency of
the "fires" people have to scramble to deal with.
The "not getting paid by Google to [fix syzkaller bugs]" part, I'm
surprised by. A big part of my Google job role is the upstream work I do
not only on security hardening but also on seccomp, pstore, execve/binfmt,
strings, etc. I'll reach out offline to find out more details.
> about this. Whether you call this "sharpening our focus", or "year of
> efficiency", or pick your corporate buzzwords, it really doesn't
> matter. The important thing is that the figure of merit must NOT be
> "how many security bugs that are found", but how much bullsh*t noise
> do these security features create, and how do you decrease overhead by
> upstream developers to deal with the fuzzing/ubsan/security tools
> find.
I guess I *do* worry about bug counts, but only in that I want them to
be _zero_. I know other folks aren't as adamant about eliminating bug
classes, but it's really not hyperbole that bugs in Linux kill people.
If you think I'm engaging in corporate buzzword shenanigans, then I have
a lot more work to do on explaining the rationale behind the security
hardening efforts.
All this said, yes, I hear what you (and Linus and others) have been
saying about minimizing the burden on other developers. I have tried my
best to keep it limited, but some things are more front-and-center (like
unexpected integer overflows), so that's why I wanted to get feedback on
how to roll it out -- I didn't see a way to make these changes in a way
that would be as unintrusive(?) as our prior efforts.
It has felt like the biggest pain point has been helping developers
shift their perspective about C: it has been more than a fancy assembler
for several decades, and we can lean on those features (and create new
ones) that shift the burden of correctness to the compiler from the
human. This does mean we need to change some styles to be more
"observable" (and unambiguous) for the compiler, though.
I think a great example recently was Peter's work creating "cleanup.h".
This feature totally changes how people have to read a function (increase
in burden), leans heavily on compiler behaviors, and shifts the burden
of correctness away from the human. It's great! But it's also _very
different_ from traditional/old-school C.
-Kees
--
Kees Cook