Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down

From: Boris Ostrovsky
Date: Mon Mar 14 2016 - 09:13:16 EST

On 03/12/2016 04:19 AM, Thomas Gleixner wrote:

On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
On 07/14/2015 04:15 PM, Thomas Gleixner wrote:
The issue here is that all architectures need that protection and just
Xen does irq allocations in cpu_up.

So moving that protection into architecture code is not really an

Otherwise we will need to have something like arch_post_cpu_up()
after the lock is released.
I'm not sure, that this will work. You probably want to do this in the
cpu prepare stage, i.e. before calling __cpu_up().
For PV guests (the ones that use xen_cpu_up()) it will work either before
after __cpu_up(). At least my (somewhat limited) testing didn't show any
problems so far.

However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you
see that xen_smp_intr_init() needs to be called before native_cpu_up() but
xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
called after.

I think I can split xen_init_lock_cpu() so that the part that needs to be
called after will avoid going into irq core code. And then the rest will
into arch_cpu_prepare().
I think we should revisit this for 4.3. For 4.2 we can do the trivial
variant and move the locking in native_cpu_up() and x86 only. x86 was
the only arch on which such wreckage has been seen in the wild, but we
should have that protection for all archs in the long run.

Patch below should fix the issue.
Thanks! Most of my tests passed, I had a couple of failures but I will need to
see whether they are related to this patch.
Did you ever come around to address that irq allocation from within cpu_up()?

I really want to generalize the protection instead of carrying that x86 only
hack forever.

Sorry, I completely forgot about this. Let me see how I can take allocations from under the lock. I might just be able to put them in CPU notifiers --- most into CPU_UP_PREPARE but spinlock interrupt may need to go into CPU_ONLINE.