Re: [PATCH] arm64: Trap WFI executed in userspace

From: Dave Martin
Date: Thu Aug 09 2018 - 08:47:07 EST


On Thu, Aug 09, 2018 at 01:38:12PM +0100, Will Deacon wrote:
> On Thu, Aug 09, 2018 at 01:34:57PM +0100, Dave Martin wrote:
> > On Wed, Aug 08, 2018 at 01:34:09PM +0100, Catalin Marinas wrote:
> > > On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:
> > > > On 07/08/18 11:05, Dave Martin wrote:
> > > > > On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:
> > > > >> It recently came to light that userspace can execute WFI, and that
> > > > >> the arm64 kernel doesn trap this event. This sounds rather benign,
> > >
> > > Nitpick: "doesn't".
> > >
> > > > >> but the kernel should decide when it wants to wait for an interrupt,
> > > > >> and not userspace.
> > > > >>
> > > > >> Let's trap WFI and treat it as a way to yield the CPU to another
> > > > >> process.
> > > [...]
> > > > > I can't think of a legitimate reason for userspace to execute WFI
> > > > > however. Userspace doesn't have interrupts under Linux, so it makes
> > > > > no sense to wait for one.
> > > > >
> > > > > Have we seen anybody using WFI in userspace? It may be cleaner to
> > > > > map this to SIGILL rather than be permissive and regret it later.
> > > >
> > > > I couldn't find any user, and I'm happy to just send userspace to hell
> > > > in that case. But it could also been said that since it was never
> > > > prevented, it is a de-facto ABI.
> > >
> > > I wouldn't really go as far as SIGILL on WFI. I think the patch is fine
> > > as it is. In case Will plans to merge it:
> >
> > For practical purposes I agree, because we can't control the binary
> > blobs out there: I just wanted to bang the drum because we are creating
> > semantics here and there is not an obvious correct answer to what they
> > should be.
> >
> > I'd still like to see rationale for why this should map to schedule()
> > (which userspace currently has no direct way to trigger) as opposed to
> > sched_yield() or something like that.
>
> A better idea might just be to do pc +=4 and return. If there's work
> pending, we'll hit it on the return path (just like any other ret_to_user
> call).
>
> I initially thought about sched_yield(), but it's not clear whether that
> creates a problem if, e.g. seccomp has been used to restrict that syscall.

Indeed. I can't see why that might be restricted, but there's presumably
nothing to stop people doing that today.

Other than putting the task to sleep for 1ms or something, I don't know
what to suggest ;)

Perhaps we can patch a NOP into .text, like Marc's BX trick :P

Cheers
---Dave