Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti_disable is set

From: Willy Tarreau
Date: Thu Jan 11 2018 - 10:45:18 EST


Hi Dave,

On Thu, Jan 11, 2018 at 07:29:30AM -0800, Dave Hansen wrote:
> I don't think we need a "NOW" and "NEXT" mode, at least initially. The
> "NEXT" semantics are going to be tricky and I think "NOW" is good enough

In fact I thought the NEXT one would bring us a nice benefit which is that
we start the new process knowing the flag's value so we can decide whether
or not to apply _PAGE_NX on the pgd from the start, and never touch it
anymore.

> Whatever we do, we'll need this PTI-disable flag to be able cross
> exeve() so that a wrapper a la nice(1) work.

Absolutely!

> Initially, I think the
> default should be that it survives fork(). There are just too many
> things out there that "start up" by doing a shell script that calls a
> python script, that calls a...

Not only that, simply daemons, like most services are!

> Without the wrapper support, we're _basically_ stuck using this only in
> newly-compiled binaries. That's going to make it much less likely to
> get used.

I know, that's why I kept considering that option despite not really
needing it for my own use case.

> The inheritance also gives an app a way to re-enable protections for
> children, just from a _second_ wrapper. That's nice because it means we
> don't initially need a "NEXT" ABI.
>
> So, I'd do this:
> 1. Do the arch_prctl() (but ask the ARM guys what they want too)
> 2. Enabled for an entire process (not thread)
> 3. Inherited across fork/exec
> 4. Cleared on setuid() and friends

This one causes me a problem : some daemons already take care of dropping
privileges after the initial fork() for the sake of security. Haproxy
typically does this at boot :

- parse config
- chroot to /var/empty
- setuid(dedicated_uid)
- fork()

This ensures the process is properly isolated and hard enough to break out
of. So I'd really like this setuid() not to anihilate all we've done.
Probably that we want to drop it on suid binaries however, though I'm
having doubts about the benefits, because if the binary already allows
an intruder to inject its own meltdown code, you're quite screwed anyway.

> 5. I'm sure the security folks have/want a way to force it on forever

Sure! That's what I implemented using the sysctl.

> Next, if we decide that we have things that both don't want PTI's
> protections and are forking things not covered by #4, we can add some
> "child opt out" in the prctl(), plus maybe marking binaries somehow.

I was really thinking about using the "NOW" for this compard to the NEXT.
But I don't know what it could imply for the pgd not having the _PAGE_NX.

> Please don't forget to add ways to tell if this feature is on/off in
> /proc or whatever.

Very good idea, and it will be much more convenient than using the GET
prctl that I didn't like.

> I think we also need to be able to dump the actual
> CR3 value that we entered the kernel with before we start doing too much
> other funky stuff with the entry code.

When you say dump, you mean save it somewhere in a per_cpu variable ?

Willy