Re: [Xen-devel] [PATCH 1/4] xen/events: Clear cpu_evtchn_mask before resuming
From: David Vrabel
Date: Wed Apr 29 2015 - 12:33:17 EST
On 28/04/15 19:29, Boris Ostrovsky wrote:
> On 04/28/2015 12:28 PM, David Vrabel wrote:
>> On 28/04/15 16:52, Boris Ostrovsky wrote:
>>> When a guest is resumed, the hypervisor may change event channel
>>> assignments. If this happens and the guest uses 2-level events it
>>> is possible for the interrupt to be claimed by wrong VCPU since
>>> cpu_evtchn_mask bits may be stale. This can happen even though
>>> evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that
>>> is passed in is not necessarily the original one (from pre-migration
>>> times) but instead is freshly allocated during resume and so any
>>> information about which CPU the channel was bound to is lost.
>>>
>>> Thus we should clear the mask during resume.
>>>
>>> We also need to make sure that bits for xenstore and console channels
>>> are set when these two subsystems are resumed. While rebind_evtchn_irq()
>>> (which is invoked for both of them on a resume) calls
>>> irq_set_affinity(),
>>> the latter will in fact postpone setting affinity until handling the
>>> interrupt. But because cpu_evtchn_mask will have bits for these two
>>> cleared
>>> we won't be able to take the interrupt.
>>>
>>> Setting IRQ_MOVE_PCNTXT flag for the two irqs avoids this problem by
>>> allowing to set affinity immediately, which is safe for
>>> event-channel-based
>>> interrupts.
>> [...]
>>> --- a/drivers/tty/hvc/hvc_xen.c
>>> +++ b/drivers/tty/hvc/hvc_xen.c
>>> @@ -533,6 +533,7 @@ static int __init xen_hvc_init(void)
>>> info = vtermno_to_xencons(HVC_COOKIE);
>>> info->irq = bind_evtchn_to_irq(info->evtchn);
>>> + irq_set_status_flags(info->irq, IRQ_MOVE_PCNTXT);
>>> }
>>> if (info->irq < 0)
>>> info->irq = 0; /* NO_IRQ */
>>> diff --git a/drivers/xen/events/events_2l.c
>>> b/drivers/xen/events/events_2l.c
>>> index 5db43fc..7dd4631 100644
>>> --- a/drivers/xen/events/events_2l.c
>>> +++ b/drivers/xen/events/events_2l.c
>>> @@ -345,6 +345,15 @@ irqreturn_t xen_debug_interrupt(int irq, void
>>> *dev_id)
>>> return IRQ_HANDLED;
>>> }
>>> +static void evtchn_2l_resume(void)
>>> +{
>>> + int i;
>>> +
>>> + for_each_online_cpu(i)
>>> + memset(per_cpu(cpu_evtchn_mask, i), 0, sizeof(xen_ulong_t) *
>>> + EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
>>> +}
>>> +
>>> static const struct evtchn_ops evtchn_ops_2l = {
>>> .max_channels = evtchn_2l_max_channels,
>>> .nr_channels = evtchn_2l_max_channels,
>>> @@ -356,6 +365,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
>>> .mask = evtchn_2l_mask,
>>> .unmask = evtchn_2l_unmask,
>>> .handle_events = evtchn_2l_handle_events,
>>> + .resume = evtchn_2l_resume,
>>> };
>>> void __init xen_evtchn_2l_init(void)
>>> diff --git a/drivers/xen/xenbus/xenbus_comms.c
>>> b/drivers/xen/xenbus/xenbus_comms.c
>>> index fdb0f33..30203d1 100644
>>> --- a/drivers/xen/xenbus/xenbus_comms.c
>>> +++ b/drivers/xen/xenbus/xenbus_comms.c
>>> @@ -231,6 +231,7 @@ int xb_init_comms(void)
>>> }
>>> xenbus_irq = err;
>>> + irq_set_status_flags(xenbus_irq, IRQ_MOVE_PCNTXT);
>> IRQ_MOVE_PCNTXT means "Interrupt can be migrated from process context"
>> which doesn't really sound relevant to me here?
>>
>> Thomas Glexnier is really not happy with mis-use of IRQ APIs.
>
> Yes, this is somewhat expanding definition of IRQ_MOVE_PCNTXT but then
> there are other devices that do the same (HPET, for one).
Ok. But why set it on only these two event channels and not every one?
>> From the commit log the evtchn_2l_resume() fucntion that's added sounds
>> like it fixes the problem on its own?
>
> It in fact makes this problem worse since now that cpu_evtchn_mask is
> cleared during resume we cannot process the interrupt anymore in
> evtchn_2l_handle_events(): irqs have to be bound to a cpu in order for
> an interrupt to be processed.
Perhaps evtchn_2l_resume() should set the local cpu mask for any bound
event channels? And then you wouldn't need IRQ_MOVE_PCNTX.
> We already have this issue even without clearing the mask if, say,
> xenstore channel changes or if it gets bound to cpu other than 0 before
> suspend. It's just that we rarely (if ever) see this happen.
>
> We can explicitly call events_base.c:set_affinity_irq() from
> rebind_evtchn_irq() after irq_set_affinity() if we want to avoid using
> IRQ_MOVE_PCNTXT flag but that will also looks kind of strange.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/