Re: x86_64 Question: Are concurrent IPI requests safe?

From: Thomas Gleixner
Date: Wed May 11 2016 - 10:15:45 EST


On Wed, 11 May 2016, Tetsuo Handa wrote:
> Thomas Gleixner wrote:
> > On Mon, 9 May 2016, Tetsuo Handa wrote:
> > >
> > > It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
> > > regardless of calling cpu. Thus, native_apic_mem_read() and
> > > native_apic_mem_write() are using globally shared constant memory
> > > address and __xapic_wait_icr_idle() is making decision based on
> > > globally shared constant memory address. Am I right?
> >
> > No. The APIC address space is per cpu. It's the same address but it's always
> > accessing the local APIC of the cpu on which it is called.
>
> Same address but per CPU magic. I see.
>
> Now, I'm trying with CONFIG_TRACE_IRQFLAGS=y and I can observe that
> irq event stamp shows that hardirqs are disabled for two CPUs when I hit
> this bug. It seems to me that this bug is triggered when two CPUs are
> concurrently calling smp_call_function_many() with wait == true.


> [ 180.434649] hardirqs last enabled at (5324977): [<ffff88007860f990>] 0xffff88007860f990
> [ 180.434650] hardirqs last disabled at (5324978): [<ffff88007860f990>] 0xffff88007860f990

Those addresses are on the stack !?! That makes no sense whatsoever.

> [ 180.434659] task: ffff88007a046440 ti: ffff88007860c000 task.ti: ffff88007860c000
> [ 180.434665] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
> [ 180.434666] RSP: 0000:ffff88007860f950 EFLAGS: 00000202

And on this CPU interrupt are enabled because the IF bit (9) in EFLAGS is set.

> [ 180.548951] hardirqs last enabled at (601147): [<ffff880078cffa00>] 0xffff880078cffa00
> [ 180.551359] hardirqs last disabled at (601148): [<ffff880078cffa00>] 0xffff880078cffa00

Equally crap.

> [ 180.563802] task: ffff880077ad1940 ti: ffff880078cfc000 task.ti: ffff880078cfc000
> [ 180.565984] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
> [ 180.568517] RSP: 0000:ffff880078cff9c0 EFLAGS: 00000202

And again interrupts are enabled.

Thanks,

tglx