Re: [GIT PULL] usercopy whitelisting for v4.15-rc1

From: Kees Cook
Date: Mon Nov 20 2017 - 19:42:57 EST

Next message: Maciej S. Szmigiero: "Re: [alsa-devel] [PATCH 1/2] ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure"
Previous message: Rasmus Villemoes: "[PATCH 2/3] net: use dev_alloc_name_ns instead of dev_get_valid_name"
In reply to: Matthew Garrett: "Re: [GIT PULL] usercopy whitelisting for v4.15-rc1"
Next in thread: Lukas Wunner: "Re: [GIT PULL] usercopy whitelisting for v4.15-rc1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Nov 20, 2017 at 3:29 PM, Matthew Garrett <mjg59@xxxxxxxxxxxxx> wrote:
> From a practical perspective this does feel like a completely reasonable
> request - when changing the semantics of kernel APIs in ways that aren't
> amenable to automated analysis, doing so in a way that generates
> warnings rather than triggering breakage is pretty clearly a preferable
> approach. But these features often start off seeming simple and then
> devolving into rounds of "ok just one more fix and we'll have
> everything" and by then it's easy to have lost track of the amount of
> complexity that's developed as a result. Formalising the Right Way of
> approaching these problems would possibly help avoid this kind of
> problem in future - I'll try to write something up for
> Documentation/process.

I'm always trying to balance the requests from both ends of the
security defense spectrum. One of the most common requests I get from
people who are strongly interested in the defenses is "can this please
be enabled by default?" And then I have to explain that it sometimes
takes time for code to shake out, and it sometimes takes time for
developers to trust it, etc. This is rarely a comfort to them, but
they tend to be glad they can turn some config knob to enable the
"strongest" version, etc, because for them, a false positive is no big
deal. At the other end is the requirement that new stuff should not
break the system. Both are reasonable perspectives, but if we violate
the latter, the defense will never end up in the kernel in the first
place.

The implicit process tends to be: land a defense that is optional,
distros enable it by default, kernel enables it by default. My sense
is that the span for these steps is about 3 years. There are countless
examples, but one of the more recent is x86 KASLR. Optional (Jun 2014,
v3.14), then distro default (e.g. Ubuntu 16.10, Oct 2016), then kernel
default (Jul 2017, v4.12). (And, in my opinion, this takes way too
long. But I'd rather have the defense at all.)

The trouble tends to be even in the "optional" phase, where it may be
optional but still very disruptive. If you were using KASLR but there
was a bug, it was basically going to be fatal to the system. So, my
historical perspective for other less disruptive defenses was "oh
good, it's only BUG(), that's WAY better than taking out the entire
system", but as Linus has pointed out in several threads, BUG() can
frequently lead to that too, which is not acceptable.

A variation on "new stuff should not break the system" comes from
things that LOOK like they can't break the system (i.e. disallowing
some pathological condition that would seem to be entirely reasonable
to disallow), and then discovering that stuff does, actually, use a
pathological condition in the real world (e.g. refcount underflows,
memcpy beyond the end of a short source string, 3rd party drivers
reading directly from userspace memory). And then there's just plain
bugs in defenses (e.g. missed usercopy whitelist for SCTPv6). Handling
these conditions means that in addition to being optional, a new
defense frequently has to also be WARN()-only initially.

Though, even then, there is a spectrum of defense capabilities. KASLR
couldn't WARN() initially since it's an architectural state. The
refcount_t API doesn't really need to be stronger than WARN() because
it can defend against overflow while leaving everything running fine.
And here, as already done, usercopy whitelisting needs to WARN() for
its initial implementation to shake out any hard-to-find cases before
becoming more aggressive.

With all this in mind, it can be tricky to find a way to provide the
right level of initial defense behavior that also has a knob for the
users that want to accept the risk of false positives. I'll keep
trying to get it right and keeping helping others find the right path.
We'll get there, steady progress, etc, etc. :)

-Kees

--
Kees Cook
Pixel Security

Next message: Maciej S. Szmigiero: "Re: [alsa-devel] [PATCH 1/2] ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure"
Previous message: Rasmus Villemoes: "[PATCH 2/3] net: use dev_alloc_name_ns instead of dev_get_valid_name"
In reply to: Matthew Garrett: "Re: [GIT PULL] usercopy whitelisting for v4.15-rc1"
Next in thread: Lukas Wunner: "Re: [GIT PULL] usercopy whitelisting for v4.15-rc1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]