Hi Avi,
On Sat, Jan 06, 2018 at 09:33:28PM +0200, Avi Kivity wrote:
Meltdown and Spectre mitigations focus on protecting the kernel from aDefinitely :-)
hostile userspace. However, it's not a given that the kernel is the most
important target in the system. It is common in server workloads that a
single userspace application contains the valuable data on a system, and if
it were hostile, the game would already be over, without the need to
compromise the kernel.
In these workloads, a single application performs most system calls, and so
it pays the cost of protection, without benefiting from it directly (since
it is the target, rather than the kernel).
I propose to create a new capability, CAP_PAYLOAD, that allows the systemInitially I was thinking about letting applications disable PTI using
administrator to designate an application as the main workload in that
system. Other processes (like sshd or monitoring daemons) exist to support
it, and so it makes sense to protect the rest of the system from their being
compromised.
prctl() when running under a certain capability (I initially thought
about CAP_SYSADMIN though I changed my mind). One advantage of
proceeding like this is that it would have to be explicitly implemented
in the application, which limits the risk of running by default.
I later thought that we could use CAP_RAWIO for this, given that such
processes already have access to the hardware anyway. We could even
imagine not switching the page tables on such a capability without
requiring prctl(), though it would mean that processes running as root
(as is often found on a number of servers) would automatically present
a risk for the system. But maybe CAP_RAWIO + prctl() could be a good
solution.
I'm interested in participating to working on such a solution, given
that haproxy is severely impacted by "pti=on" and that for now we'll
have to run with "pti=off" on the whole system until a more suitable
solution is found.
I'd rather not rush anything and let things calm down for a while to
avoid adding disturbance to the current situation. But I'm willing to
continue this discussion and even test patches.