Re: Proposal: CAP_PAYLOAD to reduce Meltdown and Spectre mitigation costs

From: Willy Tarreau
Date: Sun Jan 07 2018 - 12:44:16 EST


On Sun, Jan 07, 2018 at 11:14:21AM +0200, Avi Kivity wrote:
> CAP_RAWIO is like CAP_PAYLOAD in that both allow you to read stuff you
> shouldn't have access to on a vulnerable CPU. But CAP_PAYLOAD won't give you
> that access on a non-vulnerable CPU, so it's safer.

But it's still a wider surface for something quite similar. With
CAP_SYS_RAWIO you already have /dev/mem, iopl(), etc. I don't think
it's unreasonable to require that prctl() is added to applications
that require such functionality, it's not really more difficult to
deal with than dealing with an extra capability and managing its
impacts. And prctl() already does quite a lot of similar stuff like
enabling/disabling access to the TSC for example.

> The advantage of not requiring prctl() is that it will work on unmodified
> applications, requiring only sysadmin intervention (and it's the sysadmin's
> role to designate an application as payload, not the application's).

It can as well be seen as a configuration option. And not opening this
to any random application by default sounds reasonable as well. I'm not
saying it's perfect, just trying to figure a reasonable path here.

> > I'm interested in participating to working on such a solution, given
> > that haproxy is severely impacted by "pti=on" and that for now we'll
> > have to run with "pti=off" on the whole system until a more suitable
> > solution is found.
> >
> > I'd rather not rush anything and let things calm down for a while to
> > avoid adding disturbance to the current situation. But I'm willing to
> > continue this discussion and even test patches.
> >
> >
>
> Then you might want to test
> https://www.spinics.net/lists/kernel/msg2689101.html and its companion
> patchset https://www.spinics.net/lists/kernel/msg2689134.html, which as a
> side effect significantly reduce KPTI impact on C10K applications (and as
> their main effect improve their performance).

I've seen that two days ago but didn't read more. Now I've checked a bit
more but it seems very focused on block I/O (which makes sense for a DB
or for a server for example), which will not help for my specific use
case. In my case I'm wasting a lot of time in accept(), setsockopt(),
fcntl(), bind(), connect(), recv(), send(), shutdown() or close(). The
poller is almost unnoticeable since I/O events are grouped.

Cheers,
Willy