Re: [PATCH 4/5] s390: define ISOLATE_BP to run tasks with modified branch prediction

From: Radim KrÄmÃÅ
Date: Wed Jan 24 2018 - 06:50:52 EST


2018-01-24 07:36+0100, Martin Schwidefsky:
> On Tue, 23 Jan 2018 21:32:24 +0100
> Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> wrote:
>
> > 2018-01-23 15:21+0100, Christian Borntraeger:
> > > Paolo, Radim,
> > >
> > > this patch not only allows to isolate a userspace process, it also allows us
> > > to add a new interface for KVM that would allow us to isolate a KVM guest CPU
> > > to no longer being able to inject branches in any host or other guests. (while
> > > at the same time QEMU and host kernel can run with full power).
> > > We just have to set the TIF bit TIF_ISOLATE_BP_GUEST for the thread that runs a
> > > given CPU. This would certainly be an addon patch on top of this patch at a later
> > > point in time.
> >
> > I think that the default should be secure, so userspace will be
> > breaking the isolation instead of setting it up and having just one
> > place to screw up would be better -- the prctl could decide which
> > isolation mode to pick.
>
> The prctl is one direction only. Once a task is "secured" there is no way back.

Good point, I was thinking of reversing the direction and having
TIF_NOT_ISOLATE_BP_GUEST prctl, but allowing tasks to subvert security
would be even worse.

> If we start with a default of secure then *all* tasks will run with limited
> branch prediction.

Right, because all of them are untrusted. What is the performance
impact of BP isolation?

This design seems very fragile to me -- we're forcing userspace to care
about some arcane hardware implementation and isolation in the system is
broken if a task running malicious code doesn't do that for any reason.

> > Maybe we can change the conditions and break logical connection between
> > TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST, to make a separate KVM
> > interface useful.
>
> The thinking here is that you use TIF_ISOLATE_BP to make use space secure,
> but you need to close the loophole that you can use a KVM guest to get out of
> the secured mode. That is why you need to run the guest with isolated BP if
> TIF_ISOLATE_BP is set. But if you want to run qemu as always and only the
> KVM guest with isolataed BP you need a second bit, thus TIF_ISOLATE_GUEST_BP.

I understand, I was following the misguided idea where we have reversed
logic and then use just TIF_NOT_ISOLATE_GUEST_BP for sie switches.

> > > Do you think something similar would be useful for other architectures as well?
> >
> > It goes against my idea of virtualization, but there probably are users
> > that don't care about isolation and still use virtual machines ...
> > I expect most architectures to have a fairly similar resolution of
> > branch prediction leaks, so the idea should be easily abstractable on
> > all levels. (At least x86 is.)
>
> Yes.
>
> > > In that case we should try to come up with a cross-architecture interface to enable
> > > that.
> >
> > Makes me think of a generic VM control "prefer performance over
> > security", which would also take care of future problems and let arches
> > decide what is worth the code.
>
> VM as in virtual machine or VM as in virtual memory?

Virtual machine. (But could be anywhere really, especially the
kernel/user split slowed applications down for too long already. :])

> > A main drawback is that this will introduce dynamic branches to the
> > code, which are going to slow down the common case to speed up a niche.
>
> Where would you place these additional branches? I don't quite get the idea.

The BP* macros contain a branch in them -- avoidable if we only had
isolated virtual machines.

Thanks.