Re: [patch V9 21/39] x86/irq: Convey vector as argument and not in ptregs

From: Alexander Graf
Date: Tue Aug 25 2020 - 19:18:23 EST


Hi Thomas,

On 25.08.20 12:28, Thomas Gleixner wrote:

Alex,

On Mon, Aug 24 2020 at 19:29, Alexander Graf wrote:
I'm currently trying to understand a performance regression with
ScyllaDB on i3en.3xlarge (KVM based VM on Skylake) which we reliably
bisected down to this commit:

https://github.com/scylladb/scylla/issues/7036

What we're seeing is that syscalls such as membarrier() take forever
(0-10 µs would be normal):
...
That again seems to stem from a vastly slowed down
smp_call_function_many_cond():

Samples: 218K of event 'cpu-clock', 4000 Hz
Overhead Shared Object Symbol
94.51% [kernel] [k] smp_call_function_many_cond
0.76% [kernel] [k] __do_softirq
0.32% [kernel] [k] native_queued_spin_lock_slowpath
[...]

which is stuck in

│ csd_lock_wait():
│ smp_cond_load_acquire(&csd->flags, !(VAL &
0.00 │ mov 0x8(%rcx),%edx
0.00 │ and $0x1,%edx
│ ↓ je 2b9
│ rep_nop():
0.70 │2af: pause
│ csd_lock_wait():
92.82 │ mov 0x8(%rcx),%edx
6.48 │ and $0x1,%edx
0.00 │ ↑ jne 2af
0.00 │2b9: ↑ jmp 282


Given the patch at hand I was expecting lost IPIs, but I can't quite see
anything getting lost.

I have no idea how that patch should be related to IPI and smp function
calls. It's changing the way how regular device interrupts and their
spurious counterpart are handled and not the way how IPIs are
handled. They are handled by direct vectors and do not go through
do_IRQ() at all.

Aside of that the commit just changes the way how the interrupt vector
of a regular device interrupt is stored and conveyed. The extra read and
write on the cache hot stack is hardly related to anything IPI.

I am as puzzled as you are, but the bisect is very clear: 79b9c183021e works fast and 633260fa1 (as well as mainline) shows the weird behavior above.

It gets even better. This small (demonstrative only, mangled) patch on top of 633260fa1 also resolves the performance issue:

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index c766936..7e91e9a 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -239,6 +239,7 @@ __visible void __irq_entry do_IRQ(struct pt_regs *regs, unsigned long vector)
* lower 8 bits.
*/
vector &= 0xFF;
+ regs->orig_ax = ~vector;

/* entering_irq() tells RCU that we're not quiescent. Check it. */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");


To me that sounds like some irq exit code somewhere must still be looking at orig_ax to decide on something - and that something is wrong now that we removed the negation of the vector. It also seems to have an impact on remote function calls.

I'll have a deeper look tomorrow again if I can find any such place, but I wouldn't mind if anyone could point me into the right direction earlier :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879