Re: Pte_special broken on Xen PV when NUMA balancing is enabled

From: David Vrabel
Date: Wed Nov 05 2014 - 11:59:00 EST


On 05/11/14 16:41, Wei Liu wrote:
> Hi all
>
> I'm developing virtual NUMA support for Xen. One thing I notice is that when
> NUMA balancing is enabled, kernel will crash with following backtrace.
>
> [ 404.281396] CPU: 0 PID: 1058 Comm: dd Tainted: G B W 3.18.0-rc3-bp+ #3
> [ 404.281403] 0000000000000000 00007fd62eca3000 ffffffff817b7cac ffff880172d298b8
> [ 404.281415] ffffffff8110383f 0720072007300732 00000007fd62eca3 0720072007200720
> [ 404.281426] ffff880172d298b8 ffff88017300bbb0 00007fd62eca3000 0000000000000000
> [ 404.281437] Call Trace:
> [ 404.281444] [<ffffffff817b7cac>] ? dump_stack+0x41/0x51
> [ 404.281452] [<ffffffff8110383f>] ? print_bad_pte+0x19f/0x1cb
> [ 404.281460] [<ffffffff81104479>] ? vm_normal_page+0x51/0x87
> [ 404.281469] [<ffffffff8110c5ef>] ? change_protection+0x4fb/0x76a
> [ 404.281477] [<ffffffff81106435>] ? handle_mm_fault+0x9e0/0xa11
> [ 404.281486] [<ffffffff8111cc22>] ? change_prot_numa+0x13/0x24
> [ 404.281495] [<ffffffff8106abf0>] ? task_numa_work+0x20c/0x2ac
> [ 404.281503] [<ffffffff810615e7>] ? finish_task_switch+0x83/0xc5
> [ 404.281512] [<ffffffff8105af10>] ? task_work_run+0x7b/0x8f
> [ 404.281521] [<ffffffff8100d732>] ? do_notify_resume+0x5a/0x6d
> [ 404.281529] [<ffffffff817bf49f>] ? retint_signal+0x48/0x89
> [ 404.281537] [<ffffffff810012eb>] ? xen_hypercall_iret+0xb/0x20
>
> Decoding page flags 0x366 we have _PAGE_SPECIAL(_PAGE_NUMA) and
> _PAGE_GLOBAL(_PAGE_PROTNONE) set, _PAGE_PRESENT not set. It's handling
> a NUMA hint page fault and crashes because the PTE in question is
> considered a special PTE by pte_special.
>
> In a Xen PV guest, _PAGE_GLOBAL is added by hypervisor to mark the page
> a guest user space page. Xen PV kernel has already forbidden setting
> that bit during initialisation. It's a bit unfortunate that there's
> still clash with _PAGE_PROTNONE.

_PAGE_IOMAP is no more (f955371 "x86: remove the Xen-specific
_PAGE_IOMAP PTE flag) so there's now a spare _PAGE_SOFTW2 that could be
used for NUMA hinting, instead of this confusing/complex aliasing.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/