Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts
From: Rafael J. Wysocki
Date: Thu Jul 31 2014 - 16:45:42 EST
On Thursday, July 31, 2014 04:12:55 PM Alan Stern wrote:
> On Thu, 31 Jul 2014, Rafael J. Wysocki wrote:
>
> > And before we enter the wakeup handling slippery slope, let me make a note
> > that this problem is bothering me quite a bit at the moment. In my opinion
> > we need to address it somehow regardless of the wakeup issues and I'm not sure
> > if failing __setup_irq() when there's a mismatch (that is, there are existing
> > actions for the given irq_desc and their IRQF_NO_SUSPEND settings are not
> > consistent with the new one) is the right way to do that, because it may make
> > things behave a bit randomly (it will always fail the second guy, but that need
> > not be the one who's requested IRQF_NO_SUSPEND and it depends on the ordering
> > between them).
>
> Pardon me for sticking my nose into the middle of the conversation, but
> here's what it looks like to me:
>
> The entire no_irq phase of suspend/resume is starting to seem like a
> mistake. We should never have done it.
In hindsight, I totally agree. Question is what we can do about it now.
> Alternatively, it might be
> okay to disable _all_ interrupts during the no_irq phase provided they
> are then _all_ enabled again before entering the sysdev and
> platform-specific parts of suspend (or the final freeze).
>
> As I understand it, the idea behind the no_irq phase was to make life
> easier for drivers. They wouldn't have to worry about strange things
> cropping up while they're in the middle of powering down their devices
> or afterward.
Actually, it was done to prevent bugs in PCI drivers from crashing boxes
during suspend and resume IIRC.
When we were debugging PME with Peter a couple of days ago I asked him to
comment out suspend/resume_device_irqs() experimentally and see what
happens and it turned out that the system didn't resume any more. It looks
like it still prevents problems from happening, then.
> Well, guess what? It turns out that they do have to worry about it
> after all. Timers can still fire during suspend transitions, and if an
> IRQ line is shared with a wakeup source then it won't be disabled.
OK, that's where it starts to get really messy. If people were not using
IRQF_NO_SUSPEND excessively, the situation would be that almost all IRQ
lines, including the ones of wakeup sources, would be disabled by
suspend_device_irqs() (the exceptions being really low-level stuff that
needs to run during suspend/resume and not sharing IRQ lines).
The wakeup would then be handled with the help of the lazy disable mechanism,
that is, enable_irq_wake() from the driver and then check_wakeup_irqs() in
syscore_suspend().
However, what we have now is a mixture of different mechanisms often used
for things they had not been intended for.
But I agree that we'd probably have avoided it if suspend_device_irqs() hadn't
existed.
> The fact is, drivers should not rely on disabled interrupts to prevent
> untimely interrupt requests or wakeups. They should configure their
> devices not to generate any interrupt requests at all, apart from
> wakeup requests. And their interrupt handlers shouldn't mind being
> invoked for a shared IRQ, even after their devices are turned off.
>
> Any driver that leaves its device capable of generating non-wakeup
> interrupt requests during suspend, and relies on interrupts being
> globally disabled to avoid problems, is most likely broken. Yes, it
> may be acceptable in cases where the IRQ line isn't shared, but it's
> still a bad design.
Totally agreed.
So how can we eliminate the noirq phase in a workable way?
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/