Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10

From: Jürgen Groß
Date: Tue Dec 15 2020 - 05:21:39 EST


On 15.12.20 08:27, Jürgen Groß wrote:
On 14.12.20 22:25, Julien Grall wrote:
Hi Juergen,

When testing Linux 5.10 dom0, I could reliably hit the following warning with using event 2L ABI:

[  589.591737] Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b
[  589.593259] WARNING: CPU: 0 PID: 1111 at /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 evtchn_interrupt+0xeb/0x100
[  589.595514] Modules linked in:
[  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G W         5.10.0+ #180
[  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
[  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 00 e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 29 a0 ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f
[  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
[  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 0000000000000027
[  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: ffff88817fe19158
[  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 0000000000328980
[  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: ffff888107fd3c00
[  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: ffff88810f5ffd80
[  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) knlGS:0000000000000000
[  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 0000000000050660
[  589.615874] Call Trace:
[  589.616402]  <IRQ>
[  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
[  589.617784]  handle_irq_event_percpu+0x30/0x80
[  589.618660]  handle_irq_event+0x3a/0x60
[  589.619428]  handle_edge_irq+0x9b/0x1f0
[  589.620209]  generic_handle_irq+0x4f/0x60
[  589.621008]  evtchn_2l_handle_events+0x160/0x280
[  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
[  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
[  589.623665]  asm_call_irq_on_stack+0x12/0x20
[  589.624511]  </IRQ>
[  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
[  589.625848]  exc_xen_hypervisor_callback+0x8/0x10

This can be reproduced when creating/destroying guest in a loop. Although, I have struggled to reproduce it on a vanilla Xen.

After several hours of debugging, I think I have found the root cause.

While we only expect the unmask to happen when the event channel is EOIed, there is an unmask happening as part of handle_edge_irq() because the interrupt was seen as pending by another vCPU (IRQS_PENDING is set).

It turns out that the event channel is set for multiple vCPU is in cpu_evtchn_mask. This is happening because the affinity is not cleared when freeing an event channel.

The implementation of evtchn_2l_handle_events() will look for all the active interrupts for the current vCPU and later on clear the pending bit (via the ack() callback). IOW, I believe, this is not an atomic operation.

Even if Xen will notify the event to a single vCPU, evtchn_pending_sel may still be set on the other vCPU (thanks to a different event channel). Therefore, there is a chance that two vCPUs will try to handle the same interrupt.

The IRQ handler handle_edge_irq() is able to deal with that and will mask/unmask the interrupt. This will mess us with the lateeoi logic (although, I managed to reproduce it once without XSA-332).

Thanks for the analysis!

My initial idea to fix the problem was to switch the affinity from CPU X to CPU0 when the event channel is freed.

However, I am not sure this is enough because I haven't found anything yet preventing a race between evtchn_2l_handle_events9) and evtchn_2l_bind_vcpu().

So maybe we want to introduce a refcounting (if there is nothing provided by the IRQ framework) and only unmask when the counter drop to 0.

Any opinions?

I think we don't need a refcount, but just the internal states "masked"
and "eoi_pending" and unmask only if both are false. "masked" will be
set when the event is being masked. When delivering a lateeoi irq
"eoi_pending" will be set and "masked "reset. "masked" will be reset
when a normal unmask is happening. And "eoi_pending" will be reset
when a lateeoi is signaled. Any reset of "masked" and "eoi_pending"
will check the other flag and do an unmask if both are false.

I'll write a patch.

Julien, could you please test the attached (only build tested) patch?


Juergen
From 2ce5786fd6f29ec09ad653e30e089042ea62b309 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@xxxxxxxx>
Date: Tue, 15 Dec 2020 10:37:11 +0100
Subject: [PATCH] xen/events: don't unmask an event channel when an eoi is
pending

An event channel should be kept masked when an eoi is pending for it.
When being migrated to another cpu it might be unmasked, though.

In order to avoid this keep two different flags for each event channel
to be able to distinguish "normal" masking/unmasking from eoi related
masking/unmasking. The event channel should only be able to generate
an interrupt if both flags are cleared.

Cc: stable@xxxxxxxxxxxxxxx
Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
Reported-by: Julien Grall <julien@xxxxxxx>
Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
---
drivers/xen/events/events_base.c | 64 +++++++++++++++++++++++++++-----
1 file changed, 54 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 6038c4c35db5..b024200f1677 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -96,7 +96,9 @@ struct irq_info {
struct list_head eoi_list;
short refcnt;
short spurious_cnt;
- enum xen_irq_type type; /* type */
+ short type; /* type: IRQT_* */
+ bool masked; /* Is event explicitly masked? */
+ bool eoi_pending; /* Is EOI pending? */
unsigned irq;
evtchn_port_t evtchn; /* event channel */
unsigned short cpu; /* cpu bound */
@@ -272,6 +274,8 @@ static int xen_irq_info_common_setup(struct irq_info *info,
info->irq = irq;
info->evtchn = evtchn;
info->cpu = cpu;
+ info->masked = true;
+ info->eoi_pending = false;

ret = set_evtchn_to_irq(evtchn, irq);
if (ret < 0)
@@ -545,7 +549,10 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
}

info->eoi_time = 0;
- unmask_evtchn(evtchn);
+ info->eoi_pending = false;
+
+ if (!info->masked)
+ unmask_evtchn(evtchn);
}

static void xen_irq_lateeoi_worker(struct work_struct *work)
@@ -801,7 +808,11 @@ static unsigned int __startup_pirq(unsigned int irq)
goto err;

out:
- unmask_evtchn(evtchn);
+ info->masked = false;
+
+ if (!info->eoi_pending)
+ unmask_evtchn(evtchn);
+
eoi_pirq(irq_get_irq_data(irq));

return 0;
@@ -828,6 +839,7 @@ static void shutdown_pirq(struct irq_data *data)
if (!VALID_EVTCHN(evtchn))
return;

+ info->masked = true;
mask_evtchn(evtchn);
xen_evtchn_close(evtchn);
xen_irq_info_cleanup(info);
@@ -1713,18 +1725,26 @@ EXPORT_SYMBOL_GPL(xen_set_affinity_evtchn);

static void enable_dynirq(struct irq_data *data)
{
- evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+ struct irq_info *info = info_for_irq(data->irq);
+ evtchn_port_t evtchn = info ? info->evtchn : 0;

- if (VALID_EVTCHN(evtchn))
- unmask_evtchn(evtchn);
+ if (VALID_EVTCHN(evtchn)) {
+ info->masked = false;
+
+ if (!info->eoi_pending)
+ unmask_evtchn(evtchn);
+ }
}

static void disable_dynirq(struct irq_data *data)
{
- evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+ struct irq_info *info = info_for_irq(data->irq);
+ evtchn_port_t evtchn = info ? info->evtchn : 0;

- if (VALID_EVTCHN(evtchn))
+ if (VALID_EVTCHN(evtchn)) {
+ info->masked = true;
mask_evtchn(evtchn);
+ }
}

static void ack_dynirq(struct irq_data *data)
@@ -1754,6 +1774,30 @@ static void mask_ack_dynirq(struct irq_data *data)
ack_dynirq(data);
}

+static void lateeoi_ack_dynirq(struct irq_data *data)
+{
+ struct irq_info *info = info_for_irq(data->irq);
+ evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+ if (VALID_EVTCHN(evtchn)) {
+ info->masked = false;
+ info->eoi_pending = true;
+ mask_evtchn(evtchn);
+ }
+}
+
+static void lateeoi_mask_ack_dynirq(struct irq_data *data)
+{
+ struct irq_info *info = info_for_irq(data->irq);
+ evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+ if (VALID_EVTCHN(evtchn)) {
+ info->masked = true;
+ info->eoi_pending = true;
+ mask_evtchn(evtchn);
+ }
+}
+
static int retrigger_dynirq(struct irq_data *data)
{
evtchn_port_t evtchn = evtchn_from_irq(data->irq);
@@ -1973,8 +2017,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = {
.irq_mask = disable_dynirq,
.irq_unmask = enable_dynirq,

- .irq_ack = mask_ack_dynirq,
- .irq_mask_ack = mask_ack_dynirq,
+ .irq_ack = lateeoi_ack_dynirq,
+ .irq_mask_ack = lateeoi_mask_ack_dynirq,

.irq_set_affinity = set_affinity_irq,
.irq_retrigger = retrigger_dynirq,
--
2.26.2

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature