Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

From: George Dunlap
Date: Mon Mar 17 2014 - 13:14:44 EST

Next message: Peter Zijlstra: "Re: cond_resched() and RCU CPU stall warnings"
Previous message: David Laight: "RE: [PATCH v3 1/4] net: add name_assign_type netdev attribute"
In reply to: Ingo Molnar: "Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception"
Next in thread: Sarah Newman: "Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/17/2014 05:05 PM, Jan Beulich wrote:

On 17.03.14 at 17:55, "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:

On 03/17/2014 05:19 AM, George Dunlap wrote:

On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:

No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a

workaround for the legacy hypervisor versions.

The interface wasn't an accident. In the most common case you'll want
to clear the bit anyway. In PV mode clearing it would require an extra
trip up into the hypervisor. So this saves one trip up into the
hypervisor on every context switch which involves an FPU, at the
expense of not being able to context-switch away when handling the
trap.

The interface was a complete faceplant, because it caused failures.
You're not infinitely unconstrained since you want to play in the same
sandbox as the native architecture, and if you want to have a hope of
avoiding these kinds of failures you really need to avoid making random
"improvements", certainly not without an explicit guest opt-in (the same
we do for the native CPU architecture when adding new features.)

So if this interface wasn't an accident it was active negligence and
incompetence.

I don't think so - while it (as we now see) disallows certain things
inside the guest, back at the time when this was designed there was
no sign of any sort of allocation/scheduling being done inside the
#NM handler. And furthermore, a PV specification is by its nature
allowed to define deviations from real hardware behavior, or else it
wouldn't be needed in the first place.

But it's certainly the case that deviating from the hardware in *this* way by default was always very likely to case the exact kind of bug we've seen here. It is an "interface trap" that was bound to be tripped over (much like Intel's infamous sysret vulnerability).

Making it opt-in would have been a much better idea. But the people who made that decision are long gone, and we now need to deal with the situation as we have it.

-George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Peter Zijlstra: "Re: cond_resched() and RCU CPU stall warnings"
Previous message: David Laight: "RE: [PATCH v3 1/4] net: add name_assign_type netdev attribute"
In reply to: Ingo Molnar: "Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception"
Next in thread: Sarah Newman: "Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]