Re: [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments

From: Andrew Cooper
Date: Tue Apr 21 2015 - 04:27:02 EST


On 21/04/2015 01:35, Andy Lutomirski wrote:
> On 04/20/2015 10:09 AM, Andrew Cooper wrote:
>> There appears to be no formal statement of what pv_irq_ops.save_fl() is
>> supposed to return precisely. Native returns the full flags, while
>> lguest and
>> Xen only return the Interrupt Flag, and both have comments by the
>> implementations stating that only the Interrupt Flag is looked at.
>> This may
>> have been true when initially implemented, but no longer is.
>>
>> To make matters worse, the Xen PVOP leaves the upper bits undefined,
>> making
>> the BUG_ON() undefined behaviour. Experimentally, this now trips for
>> 32bit PV
>> guests on Broadwell hardware. The BUG_ON() is consistent for an
>> individual
>> build, but not consistent for all builds. It has also been a sitting
>> timebomb
>> since SMAP support was introduced.
>>
>> Use native_save_fl() instead, which will obtain an accurate view of
>> the AC
>> flag.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> CC: Ingo Molnar <mingo@xxxxxxxxxx>
>> CC: H. Peter Anvin <hpa@xxxxxxxxx>
>> CC: x86@xxxxxxxxxx
>> CC: linux-kernel@xxxxxxxxxxxxxxx
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>> CC: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
>> CC: David Vrabel <david.vrabel@xxxxxxxxxx>
>> CC: xen-devel <xen-devel@xxxxxxxxxxxxx>
>> CC: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
>> CC: lguest@xxxxxxxxxxxxxxxx
>>
>> ---
>> This patch is RFC because I am not certain that native_save_fl() is
>> necessarily the correct solution on lguest, but it does seem that
>> setup_smap()
>> wants to check the actual AC bit, rather than an idealised value.
>>
>> A different approach, given the dual nature of the AC flag now is to
>> gate
>> setup_smap() on a kernel rpl of 0. SMAP necessarily can't be used in a
>> paravirtual situation where the kernel runs in cpl > 0.
>>
>> Another different approach would be to formally state that
>> pv_irq_ops.save_fl() needs to return all the flags, which would make
>> local_irq_save() safe to use in this circumstance, but that makes a
>> hotpath
>> longer for the sake of a single boot time check.
>
> ...which reminds me:
>
> Why does native_restore_fl restore anything other than IF? A branch
> and sti should be considerably faster than popf.

I was wondering about the performance aspect, given a comment in your
patch which removed sysret64, but hadn't had time to investigate yet.

Unfortunately, irq_save()/irq_enable()/irq_restore() appears to be a
used pattern in the kernel, making the irq_restore() disable interrupts.

The performance improvement might be worth explicitly moving the onus
into the caller with irq_maybe_disable()/irq_maybe_enable(), but that
does involve altering a lot of common code for an architecture specific
gain.

>
> Also, if we did this, could Xen use PVI and then use native_restore_fl
> and avoid lots of pvops?

Xen HVM guests already use the native pvops in this area, so would
benefit from any improvement. PV guests on the other hand run with cpl
> 0 and instead have a writeable mask in a piece of shared memory with
Xen, and need the pvop.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/