Re: Proposal: CAP_PAYLOAD to reduce Meltdown and Spectre mitigation costs

From: Avi Kivity
Date: Sun Jan 07 2018 - 04:14:37 EST




On 01/06/2018 10:24 PM, Willy Tarreau wrote:
Hi Avi,

On Sat, Jan 06, 2018 at 09:33:28PM +0200, Avi Kivity wrote:
Meltdown and Spectre mitigations focus on protecting the kernel from a
hostile userspace. However, it's not a given that the kernel is the most
important target in the system. It is common in server workloads that a
single userspace application contains the valuable data on a system, and if
it were hostile, the game would already be over, without the need to
compromise the kernel.

In these workloads, a single application performs most system calls, and so
it pays the cost of protection, without benefiting from it directly (since
it is the target, rather than the kernel).
Definitely :-)

I propose to create a new capability, CAP_PAYLOAD, that allows the system
administrator to designate an application as the main workload in that
system. Other processes (like sshd or monitoring daemons) exist to support
it, and so it makes sense to protect the rest of the system from their being
compromised.
Initially I was thinking about letting applications disable PTI using
prctl() when running under a certain capability (I initially thought
about CAP_SYSADMIN though I changed my mind). One advantage of
proceeding like this is that it would have to be explicitly implemented
in the application, which limits the risk of running by default.

I later thought that we could use CAP_RAWIO for this, given that such
processes already have access to the hardware anyway. We could even
imagine not switching the page tables on such a capability without
requiring prctl(), though it would mean that processes running as root
(as is often found on a number of servers) would automatically present
a risk for the system. But maybe CAP_RAWIO + prctl() could be a good
solution.

CAP_RAWIO is like CAP_PAYLOAD in that both allow you to read stuff you shouldn't have access to on a vulnerable CPU. But CAP_PAYLOAD won't give you that access on a non-vulnerable CPU, so it's safer.

The advantage of not requiring prctl() is that it will work on unmodified applications, requiring only sysadmin intervention (and it's the sysadmin's role to designate an application as payload, not the application's).


I'm interested in participating to working on such a solution, given
that haproxy is severely impacted by "pti=on" and that for now we'll
have to run with "pti=off" on the whole system until a more suitable
solution is found.

I'd rather not rush anything and let things calm down for a while to
avoid adding disturbance to the current situation. But I'm willing to
continue this discussion and even test patches.



Then you might want to test https://www.spinics.net/lists/kernel/msg2689101.html and its companion patchset https://www.spinics.net/lists/kernel/msg2689134.html, which as a side effect significantly reduce KPTI impact on C10K applications (and as their main effect improve their performance).