Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()

From: Mauricio Faria de Oliveira

Date: Thu May 07 2026 - 11:04:35 EST


On 2026-05-07 03:58, David Woodhouse wrote:
> On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
>> kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
>> on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
>> by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
>>
>> Check for that case, and bail out early.
>>
>> Note: there is previous work and discussion on this [1] (~2 years ago),
>> which involved continuing to execute the function with changes, but it
>> was not merged. That was a different, more complex approach.
>>
>> [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@xxxxxxxxxx/
>
> ...
>
>> + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
>> + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
>> + goto out;
>
>
> The approach in Paul's earlier patch was better; we absolutely *want*
> to deliver the interrupt to the guest immediately whenever we can, and
> only fall back to the workqueue in the rare case that the shared info
> page has been invalidated.

Certainly, that was better. This was a simple workaround, but with this
clarification, it indeed doesn't fit.

> We should switch to plain read_trylock(), *without* the
> local_irq_save(). And since this was the *only* case where the GPC lock
> was ever taken under IRQ¹, all the GPC locking can drop the _irq part.

Ok, I can take a look. Or do you plan to work on it yourself (as you
hit the issue with read_unlock later in this thread)?

> Sean's concern was:
>
>>> I am not comfortable applying this patch. As shown by the need for the next patch
>>> to optimize unrelated invalidations, switching to read_trylock() is more subtle
>>> than it seems at first glance. Specifically, there are no fairness guarantees.
>
> I'm OK with that in this case. Because kvm_xen_set_evtchn_fast(), as
> with *everything* called from kvm_arch_set_irq_inatomic(), is
> explicitly designed to be a 'best effort' and allowed to return
> -EWOULDBLOCK when it's too hard.
>
> And the write lock being held here should a *rare* case, as the GPC for
> the shared_info and vcpu_info pages should basically *never* get
> invalidated while the guest is running.
>
> I've taken the same read_trylock() approach in
> https://lore.kernel.org/all/1d6712ed413ea66ef376d1410811997c3b416e99.camel@xxxxxxxxxxxxx/

Thanks for the pointers.

--
Mauricio