Re: [Xen-devel] [PATCH] xen/events: Mask a moving irq

From: David Vrabel
Date: Thu Mar 17 2016 - 13:29:10 EST


On 17/03/16 16:53, Boris Ostrovsky wrote:
> On 03/17/2016 12:03 PM, David Vrabel wrote:
>> On 17/03/16 12:45, Boris Ostrovsky wrote:
>>> Moving an unmasked irq may result in irq handler being invoked on both
>>> source and target CPUs.
>>>
>>> With 2-level this can happen as follows:
>>>
>>> On source CPU:
>>> evtchn_2l_handle_events() ->
>>> generic_handle_irq() ->
>>> handle_edge_irq() ->
>>> eoi_pirq():
>>> irq_move_irq(data);
>>>
>>> /***** WE ARE HERE *****/
>>>
>>> if (VALID_EVTCHN(evtchn))
>>> clear_evtchn(evtchn);
>>>
>>> If at this moment target processor is handling an unrelated event in
>>> evtchn_2l_handle_events()'s loop it may pick up our event since target's
>>> cpu_evtchn_mask claims that this event belongs to it *and* the event is
>>> unmasked and still pending. At the same time, source CPU will continue
>>> executing its own handle_edge_irq().
>>>
>>> With FIFO interrupt the scenario is similar: irq_move_irq() may result
>>> in a EVTCHNOP_unmask hypercall which, in turn, may make the event
>>> pending on the target CPU.
>>>
>>> We can avoid this situation by moving and clearing the event while
>>> keeping event masked.
>> Can you do:
>>
>> if (unlikely(irqd_is_setaffinity_pending(data))) {
>> masked = test_and_set_mask()
>>
>> clear_evtchn()
>> irq_move_masked_irq()
>
> I did think about this but then I wasn't sure whether this might open
> some other window for things to sneak in. It shouldn't but these things
> are rather subtle so I'd rather leave the order of how operations are
> done unchanged.

This is the order your patch has though. I'm confused.

> But I should indeed use irq_move_masked_irq() instead of irq_move_irq().

David