Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts

From: Rafael J. Wysocki
Date: Wed Jul 30 2014 - 21:56:13 EST

Next message: Rafael J. Wysocki: "Re: [PATCH] cpufreq, store_scaling_governor requires policy->rwsem to be held for duration of changing governors [v2]"
Previous message: Joseph Salisbury: "Re: [REVERT][v3.16-rc7][STABLE] usb: xhci: Prefer endpoint context dequeue pointer over stopped_trb"
In reply to: Thomas Gleixner: "Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts"
Next in thread: Thomas Gleixner: "Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thursday, July 31, 2014 02:12:11 AM Thomas Gleixner wrote:
> On Thu, 31 Jul 2014, Thomas Gleixner wrote:
> > On Wed, 30 Jul 2014, Rafael J. Wysocki wrote:
> > Before this code changes in any way I want to see:
> >
> > 1) a description of the existing semantics and their background

On that one, the meaning of IRQF_NO_SUSPEND is quite simple to me.

If it is set (for the first irqaction in a given irq_desc)
suspend_device_irqs() will leave that IRQ alone (that is, will not
disable it and will not mark it as IRQS_SUSPENDED).

As a result, if an interrupt for that irq_desc happens after
suspend_device_irqs(), the interrupt handlers from all of its irqactions
will be invoked.

So this basically means "ignore that IRQ" to suspend_device_irqs() and
that's its *only* meaning.

It's primary users are timers, per-CPU IRQs, ACPI SCI, via-pmu, Xen.
There also is a bunch of drivers that use it for wakeup stuff, but they
shouldn't in my opinion.

The background I'm aware of was primarily timers (we wanted to be able
to msleep() during PCI PM transitions in the noirq phases of system suspend
and resume among other things), and we want per-CPU stuff to work all the
time too. ACPI uses it for signaling various types of events (including
battery and thermal) that need to be handled all the time. I'm not sure
why Xen needs it exactly, but that seems to be IPI-related.

The current handling of IRQF_NO_SUSPEND has a problem vs shared interrupts
which I've tried to address by https://patchwork.kernel.org/patch/4643871/
and which is described in the changelog of that patch. Unfortunately, some
existing users pass IRQF_SHARED along with IRQF_NO_SUSPEND which is the main
motivation for that. Many of them use it for wakeup purposes, but for one
example (that doesn't use it for wakeup only) the ACPI SCI is shareable by
design.

To be continued.

> > 2) a description of the short comings of the existing semantics w/o
> > considering the new fangled (hardware) use cases
> >
> > 3) a description how to mitigate the short comings described in #2
> > w/o breaking the world and some more and introducing hard to
> > decode regressions
> >
> > 4) a description why the improvements introduced by #3 are not
> > sufficient for the new fangled (hardware) use cases
> >
> > 5) a description how to mitigate the short comings described in #4
> > w/o breaking the world and some more and introducing hard to
> > decode regressions
>
> Aside of that I want to see a coherent explanation why a shared MSI
> interrupt makes any sense at all.
>
> 25: 1 <....> 0 PCI-MSI-edge aerdrv, PCIe PME
>
> AFAICT, that's the primary reason why you insist to support wakeup
> sources on shared irq lines. And to be honest, it's utter bullshit.

No, this isn't the primary reason.

Here's /proc/interrupts from my MSI Wind system and, as you can see,
PCIe PME are (a) not MSI and (b) shared with some interesting things
(USB, WiFi and the GPU).

CPU0 CPU1
0: 26321 0 IO-APIC-edge timer
1: 379 0 IO-APIC-edge i8042
7: 6 0 IO-APIC-edge
9: 59 0 IO-APIC-fasteoi acpi
12: 2824 0 IO-APIC-edge i8042
14: 9074 0 IO-APIC-edge ata_piix
15: 0 0 IO-APIC-edge ata_piix
16: 5217 0 IO-APIC-fasteoi PCIe PME, uhci_hcd:usb4, i915
17: 13964 0 IO-APIC-fasteoi PCIe PME, rtl818x_pci
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
19: 0 0 IO-APIC-fasteoi uhci_hcd:usb2
23: 9402 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb5
40: 61 0 PCI-MSI-edge eth0
41: 102 0 PCI-MSI-edge snd_hda_intel
NMI: 0 0 Non-maskable interrupts
LOC: 18538 30831 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 6 0 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 5277 6121 Rescheduling interrupts
CAL: 92 106 Function call interrupts
TLB: 317 302 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 5 5 Machine check polls
ERR: 6
MIS: 0

> The whole purpose of MSI is to avoid interrupt sharing, but of course
> if that particular interrupt source has two potential causes, i.e. the
> AER and the PME one and the stupid hardware does not support different
> vectors or the drivers are not able to do so, it's conveniant to make
> them shared instead of thinking about them what they really are:
>
> Separate interrupts on a secondary interrupt controller connected to
> the primary (MSI) one.
>
> That's what is in fact. Simply because you can enable/disable them at
> the same IP block quite contrary to stuff which is physically shared
> by hard wired electrical connections.

I agree with the above.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rafael J. Wysocki: "Re: [PATCH] cpufreq, store_scaling_governor requires policy->rwsem to be held for duration of changing governors [v2]"
Previous message: Joseph Salisbury: "Re: [REVERT][v3.16-rc7][STABLE] usb: xhci: Prefer endpoint context dequeue pointer over stopped_trb"
In reply to: Thomas Gleixner: "Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts"
Next in thread: Thomas Gleixner: "Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]