Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option
From: Andrew Cooper
Date: Wed Jul 29 2015 - 20:29:38 EST
On 30/07/2015 00:13, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 29/07/2015 23:49, Boris Ostrovsky wrote:
>>> On 07/29/2015 06:46 PM, David Vrabel wrote:
>>>> On 29/07/2015 23:11, Andrew Cooper wrote:
>>>>> On 29/07/2015 23:05, Andy Lutomirski wrote:
>>>>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>>>>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>>> On 29/07/2015 22:26, Andy Lutomirski wrote:
>>>>>>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>>>>>>> <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>>>>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>>>>>>>>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>>>>>>>>>> FYI, I have got a repro now and am investigating.
>>>>>>>>>> Good and bad news. This bug has nothing to do with LDTs
>>>>>>>>>> themselves.
>>>>>>>>>>
>>>>>>>>>> I have worked out what is going on, but this:
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>>>>>>>>> index 5abeaac..7e1a82e 100644
>>>>>>>>>> --- a/arch/x86/xen/enlighten.c
>>>>>>>>>> +++ b/arch/x86/xen/enlighten.c
>>>>>>>>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
>>>>>>>>>> pgprot_t prot)
>>>>>>>>>> pte = pfn_pte(pfn, prot);
>>>>>>>>>> + (void)*(volatile int*)v;
>>>>>>>>>> if (HYPERVISOR_update_va_mapping((unsigned long)v,
>>>>>>>>>> pte, 0)) {
>>>>>>>>>> pr_err("set_aliased_prot va update failed w/
>>>>>>>>>> lazy mode
>>>>>>>>>> %u\n", paravirt_get_lazy_mode());
>>>>>>>>>> BUG();
>>>>>>>>>>
>>>>>>>>>> Is perhaps not the fix we are looking for, and every use of
>>>>>>>>>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>>>>>>>> I think in most cases we know that page is mapped so hopefully
>>>>>>>>> this is the
>>>>>>>>> only site that we need to be careful about.
>>>>>>>> Is there any chance we can get some kind of quick-and-dirty fix that
>>>>>>>> can go to x86/urgent in the next few days even if a clean fix isn't
>>>>>>>> available yet?
>>>>>>> Quick and dirty?
>>>>>>>
>>>>>>> Reading from v is the most obvious and quick way, for areas where
>>>>>>> we are
>>>>>>> certain v exists, is kernel memory and is expected to have a backing
>>>>>>> page. I don't know offhand how many of current
>>>>>>> HYPERVISOR_update_va_mapping() callsites this applies to.
>>>>>> __get_user((char *)v, tmp), perhaps, unless there's something better
>>>>>> in the wings. Keep in mind that we need this for -stable, and it's
>>>>>> likely to get backported quite quickly due to CVE-2015-5157.
>>>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
>>>>> would probably work, and certainly be minimal hassle for -stable.
>>>>>
>>>>> Altering the hypercall used is certainly not something to backport, nor
>>>>> are we sure it is a viable fix at this time.
>>>> Changing this one use of update_va_mapping to use mmu_update_normal_pt
>>>> is the correct fix to unblock this LDT series. I see no reason why this
>>>> cannot be backported.
>>> To properly fix it should include batching and that is not something
>>> that I think we should target for stable.
>> Batching is absolutely not necessary to alter update_va_mapping to
>> mmu_update_normal_pt. After all, update_va_mapping isn't batched.
>>
>> However this isn't the first issue issue we have had lazy mmu faulting,
>> and I doubt it is the last. There are not many callsites of
>> update_va_mapping - I will audit them tomorrow and see if any similar
>> issues are lurking elsewhere.
> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
> access to fault. Is this something we should be worried about?
Yes. update_va_mapping() will function perfectly well taking one RW
mapping to RO even if there is a second RW mapping. In such a case, the
next LDT access will fault.
On closer inspection, Xen is rather unhelpful with the fault. Xen's
lazy #PF will be bounced back to the guest with cr2 adjusted to appear
in the range passed to set_ldt(). The error code however will be
unmodified (and limited only by not-user and not-reserved), so will
appear as a non-present read or write supervisor access to an address
which the kernel has a valid read mapping of.
Unlike pagetables, there is no notion of pinning a segdesc page in the
Xen ABI. Pinning to a type allows the guest to take a single extra type
ref, and as a side effect forces eager validation of the contents. It
also prevents another unsuspecting vcpu from coming along, constructing
a writeable mapping and turning the soon-to-be-faulted-in LDT into a
plain writeable page and forcing a fault.
This frankly looks like an oversight, as pinning a segdesc page would
work work fine in the existing page model; it is just that there isn't a
hypercall to make such an action happen.
Therefore, set_ldt() needs to be confident that there are no writeable
mappings to the frames used to make up the LDT. It could proactively
fault them in by accessing one descriptor in each page inside the limit,
but by the time a fault is received it is probably too late to work out
where the other mapping is which prevented the typechange (or indeed,
whether Xen objected to one of the descriptors instead).
This is all a little bit messy.
~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/