Re: [PATCH RESEND v2 2/2] xen: enable vnuma for PV guest

From: Konrad Rzeszutek Wilk
Date: Tue Nov 19 2013 - 10:21:20 EST


On Tue, Nov 19, 2013 at 02:56:41PM +0000, David Vrabel wrote:
> On 19/11/13 14:46, Konrad Rzeszutek Wilk wrote:
> > On Tue, Nov 19, 2013 at 02:35:59PM +0000, David Vrabel wrote:
> >> On 19/11/13 14:16, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Nov 19, 2013 at 11:54:08AM +0000, David Vrabel wrote:
> >>>> On 18/11/13 21:58, Elena Ufimtseva wrote:
> >>>>> Enables numa if vnuma topology hypercall is supported and it is domU.
> >>>> [...]
> >>>>> --- a/arch/x86/xen/setup.c
> >>>>> +++ b/arch/x86/xen/setup.c
> >>>>> @@ -20,6 +20,7 @@
> >>>>> #include <asm/numa.h>
> >>>>> #include <asm/xen/hypervisor.h>
> >>>>> #include <asm/xen/hypercall.h>
> >>>>> +#include <asm/xen/vnuma.h>
> >>>>>
> >>>>> #include <xen/xen.h>
> >>>>> #include <xen/page.h>
> >>>>> @@ -598,6 +599,9 @@ void __init xen_arch_setup(void)
> >>>>> WARN_ON(xen_set_default_idle());
> >>>>> fiddle_vdso();
> >>>>> #ifdef CONFIG_NUMA
> >>>>> - numa_off = 1;
> >>>>> + if (!xen_initial_domain() && xen_vnuma_supported())
> >>>>> + numa_off = 0;
> >>>>> + else
> >>>>> + numa_off = 1;
> >>>>> #endif
> >>>>> }
> >>>>
> >>>> I think this whole #ifdef CONFIG_NUMA can be removed and hence
> >>>> xen_vnuma_supported() can be removed as well.
> >>>>
> >>>> For any PV guest we can call the xen_numa_init() and it will do the
> >>>> right thing.
> >>>>
> >>>> For dom0, the hypercall will either: return something sensible (if in
> >>>> the future Xen sets something up), or it will error.
> >>>>
> >>>> If Xen does not have vnuma support, the hypercall will error.
> >>>>
> >>>> In both error cases, the dummy numa node is setup as required.
> >>>
> >>> Incorrect. It will end up calling:
> >>>
> >>> if (!numa_init(amd_numa_init))
> >>>
> >>> which will crash dom0 (see 8d54db795 "xen/boot: Disable NUMA for PV guests.")
> >>> as that amd_numa_init is called before the dummy node init.
> >>
> >> No it won't. Any error path after the check for a PV guest will add the
> >> dummy node and return success, skipping any of the hardware-specific setup.
> >
> > Duh! I totally missed 'return' at the end of the check!
> >
> > However, even with that (so the return), that means
> > this part won't be called:
> >
> > 649 numa_init(dummy_numa_init);
> >
> > Which means there won't be any dummy numa setup?
>
> The relevant bits in dummy_numa_init are in the error path of
> xen_numa_init().

That seems the wrong place to do it. The top layer calls
in each of the numa implementations and then falls back to
the dummy.

Calling from within the implementation on something that is eventually
done on the upper level already is not right.
>
> I do think this approach (using the provided API to setup the single
> (dummy) node), is preferable to calling dummy_numa_init().

Doesn't it do the same thing? And also what about if you the user
provides fakenuma?

>
> If I thought the hypervisor ABI was finalized, I'd be happy with this
> series as-is -- the remaining issues are superficial.

That reads to me as an Ack, but I know you like to have it stated
explicitly - so could you state the proper tag please?

>
> David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/