Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

From: Jürgen Groß
Date: Sun Feb 07 2021 - 07:59:49 EST


On 06.02.21 19:46, Julien Grall wrote:
Hi Juergen,

On 06/02/2021 10:49, Juergen Gross wrote:
The first three patches are fixes for XSA-332. The avoid WARN splats
and a performance issue with interdomain events.

Thanks for helping to figure out the problem. Unfortunately, I still see reliably the WARN splat with the latest Linux master (1e0d27fce010) + your first 3 patches.

I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L events ABI.

After some debugging, I think I have an idea what's went wrong. The problem happens when the event is initially bound from vCPU0 to a different vCPU.

From the comment in xen_rebind_evtchn_to_cpu(), we are masking the event to prevent it being delivered on an unexpected vCPU. However, I believe the following can happen:

vCPU0                | vCPU1
                |
                | Call xen_rebind_evtchn_to_cpu()
receive event X            |
                | mask event X
                | bind to vCPU1
<vCPU descheduled>        | unmask event X
                |
                | receive event X
                |
                | handle_edge_irq(X)
handle_edge_irq(X)        |  -> handle_irq_event()
                |   -> set IRQD_IN_PROGRESS
 -> set IRQS_PENDING        |
                |   -> evtchn_interrupt()
                |   -> clear IRQD_IN_PROGRESS
                |  -> IRQS_PENDING is set
                |  -> handle_irq_event()
                |   -> evtchn_interrupt()
                |     -> WARN()
                |

All the lateeoi handlers expect a ONESHOT semantic and evtchn_interrupt() is doesn't tolerate any deviation.

I think the problem was introduced by 7f874a0447a9 ("xen/events: fix lateeoi irq acknowledgment") because the interrupt was disabled previously. Therefore we wouldn't do another iteration in handle_edge_irq().

I think you picked the wrong commit for blaming, as this is just
the last patch of the three patches you were testing.

Aside the handlers, I think it may impact the defer EOI mitigation because in theory if a 3rd vCPU is joining the party (let say vCPU A migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, irq_epoch, eoi_time} could possibly get mangled?

For a fix, we may want to consider to hold evtchn_rwlock with the write permission. Although, I am not 100% sure this is going to prevent everything.

It will make things worse, as it would violate the locking hierarchy
(xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).

On a first glance I think we'll need a 3rd masking state ("temporarily
masked") in the second patch in order to avoid a race with lateeoi.

In order to avoid the race you outlined above we need an "event is being
handled" indicator checked via test_and_set() semantics in
handle_irq_for_port() and reset only when calling clear_evtchn().

Does my write-up make sense to you?

Yes. What about my reply? ;-)


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature