Re: gpiochip_lock_as_irq on pins without FLAG_REQUESTED: bug or feature ?

From: Vincent Pelletier
Date: Thu Jul 01 2021 - 09:36:46 EST


Hello,

On Tue, Jun 29, 2021 at 7:19 PM Andy Shevchenko
<andy.shevchenko@xxxxxxxxx> wrote:
> > - why the plural in "set all handlers to handle_bad_irq()" ? Isn't
> > there only a single handler in struct gpio_irq_chip ?
>
> Each GPIO line may have its own handler (usually level or edge). I
> guess it's written from the GPIO point of view.
[...]
> > - "Then set the handler to [...] in the irqchip .set_type() callback"
> > Isn't set_type per-pin, and isn't the interrupt handler chip-level ?
>
> The idea behind that initially the chip-level IRQ handler is set to
> BAD. It means any (spurious) IRQ will be served by it. Now, when one
> requests IRQ the framework will call ->irq_set_type() of corresponding
> IRQ chip and change the handler for the certain pin (pin-level). So,
> the main handler is basically for spurious interrupts only.

I think I found what I was missing: I was only seeing
(struct gpio_irq_chip *)->handler = handle_bad_irq;
and was completely missing
irq_set_handler_locked((struct irq_data *), handle_..._irq);
hence my confusion about these points.
Thanks for this extra push which led me to these.
Maybe the doc should mention this function ?

> > - I do not find a function named gpiochip_irqchip_add(), only
> > gpiochip_irqchip_add_domain()
>
> Missed during update I suppose.
> https://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git/commit/?h=gpio/for-next&id=f1f37abbe6fc2b1242f78157db76e48dbf9518ee
> Feel free to submit a patch!

Makes sense. Before trying to fix the documentation I intended
to get to the end of my IRQ issues though, as I feel I am still
lacking a lot of understanding which will be needed to produce a
decent documentation patch (or three).

But I feel I am as far as I can go, as I cannot even tell what is normal
and what is not, for lack of what I would identify as a reference setup
in other machines. I have no idea what to try next, and I am ashamed to
say I am working on this issue for over a week (outside of work hours,
but still).

So here goes all I spotted, if anyone reading can tell the good from
the bad:

The high-level issue: I added a devicetree entry to enable the power-button
function of a DA9063 PMIC. This is AFAICT the first subsystem of this chip
on this board (hifive unmatched) which produces IRQs in normal use (meaning
outside of catastrophic failures on the board, like the PMIC complaining
about an overcurrent condition).

I initially thought it worked fine: short push of the button, system shuts
down cleanly.

Then I disabled power button handling in systemd-logind, and only the first
press causes a key press event, all further presses do nothing.

I hacked a script to peek at IRQ registers behind the kernel, and here is
what I see:

vincent@riscv:~$ sudo ./unmatched_gpio_irq_debug.py 1
GPIO 1: dir=in in=1 out=0 irq_en=low irq_pending=high
PLIC 24: irq=False hart=4
vincent@riscv:~$ sudo ./unmatched_gpio_irq_debug.py 1
GPIO 1: dir=in in=1 out=0 irq_en=low irq_pending=rise,high,low
PLIC 24: irq=False hart=4
vincent@riscv:~$ sudo ./unmatched_gpio_irq_debug.py 1
GPIO 1: dir=in in=0 out=0 irq_en=low irq_pending=rise,fall,high,low
PLIC 24: irq=False hart=4

Note:
- GPIO pin 1 is active low, consistently with the da9063 code and the gpio
irq_en line.
- after the first press, the gpio pin reads as "high", so the irs is ack'ed
at the da9063 level
- after the first press, the pending irq events are all but "falling", which
to me indicate the GPIO-level IRQ was ack'ed while the pin was still low,
so it immediately became pending again. Knowing the GPIO driver clears
*all* pending interrupts, I understand "rise,high,low" as meaning the GPIO
controller saw the pin go from low to high after it was cleared, which also
hints that the da9063's IRQ was cleared after the GPIO, which seems wrong
for a level interrupt - but I am not 100% sure.
- ...but the PLIC 24 pin (corresponding to GPIO 1 pin's IRQ) does not have a
pending irq, which suggests it missed the GPIO IRQ re-triggering
- and after a second key press, the only change is that now the GPIO chip did
see a falling edge, and it now has all its pending bits active for this pin.

On the /proc/interrupts front, here is what I see:

CPU0 CPU1 CPU2 CPU3
1: 637 0 0 0 SiFive PLIC 39
10010000.serial
2: 0 0 0 0 SiFive PLIC 40
10011000.serial
3: 1829 0 0 0 SiFive PLIC 52 10030000.i2c
4: 0 0 0 0 SiFive PLIC 41 10040000.spi
5: 7647 10727 9670 11094 RISC-V INTC 5 riscv-timer
6: 96 0 0 0 SiFive PLIC 43 10050000.spi
7: 0 0 0 0 SiFive PLIC 55 eth0
17: 0 0 0 0 SiFive PLIC 19 l2_ecc
18: 1 0 0 0 SiFive PLIC 21 l2_ecc
19: 0 0 0 0 SiFive PLIC 22 l2_ecc
20: 0 0 0 0 SiFive PLIC 20 l2_ecc
46: 0 0 0 0 PCI-MSI 0 PCIe PME, aerdrv
53: 22 0 0 0 PCI-MSI 3145728 nvme0q0
54: 1 0 0 0 sifive-gpio 1 da9063-irq
55: 1 0 0 0 da9063-irq 0 ONKEY
56: 0 0 0 0 da9063-irq 1 ALARM
63: 0 0 0 0 da9063-irq 8 LDO_LIM
84: 694 0 0 0 PCI-MSI 3145729 nvme0q1
85: 1780 0 0 0 PCI-MSI 3145730 nvme0q2
86: 1092 0 0 0 PCI-MSI 3145731 nvme0q3
87: 1303 0 0 0 PCI-MSI 3145732 nvme0q4
88: 6523 0 0 0 PCI-MSI 2097152 xhci_hcd
89: 0 0 0 0 PCI-MSI 2097153 xhci_hcd
90: 0 0 0 0 PCI-MSI 2097154 xhci_hcd
91: 0 0 0 0 PCI-MSI 2097155 xhci_hcd
92: 0 0 0 0 PCI-MSI 2097156 xhci_hcd
93: 0 0 0 0 sifive-gpio 6 lm90
94: 1029 0 0 0 PCI-MSI 2621440
iwlwifi: default queue
95: 282 0 0 0 PCI-MSI 2621441
iwlwifi: queue 1
96: 5 0 0 0 PCI-MSI 2621442
iwlwifi: queue 2
97: 31 0 0 0 PCI-MSI 2621443
iwlwifi: queue 3
98: 8 0 0 0 PCI-MSI 2621444
iwlwifi: queue 4
99: 5 0 0 0 PCI-MSI 2621445
iwlwifi: exception
IPI0: 109 125 110 104 Rescheduling interrupts
IPI1: 5072 10959 6008 12207 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 IRQ work interrupts
IPI4: 0 0 0 0 Timer broadcast interrupts

I note the absence of a "SiFive PLIC 24 sifive-gpio" line, but
I do not know if this is to be expected.

In kernel/debug/irqs, I further see:

# cat irqs/55
handler: handle_bad_irq
device: (null)
status: 0x00008508
_IRQ_NOPROBE
_IRQ_NESTED_THREAD
istate: 0x00000020
IRQS_ONESHOT
ddepth: 0
wdepth: 0
dstate: 0x00402208
IRQ_TYPE_LEVEL_LOW
IRQD_LEVEL
IRQD_ACTIVATED
IRQD_IRQ_STARTED
node: 0
affinity: 0-3
domain: :soc:i2c@10030000:pmic@58
hwirq: 0x0
chip: da9063-irq
flags: 0x0

# cat irqs/54
handler: handle_level_irq
device: (null)
status: 0x00000508
_IRQ_NOPROBE
istate: 0x00000020
IRQS_ONESHOT
ddepth: 0
wdepth: 0
dstate: 0x02403208
IRQ_TYPE_LEVEL_LOW
IRQD_LEVEL
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_AFFINITY_SET
IRQD_DEFAULT_TRIGGER_SET
node: 0
affinity: 0-3
domain: :soc:gpio@10060000
hwirq: 0x1
chip: sifive-gpio
flags: 0x0
parent:
domain: :soc:interrupt-controller@c000000
hwirq: 0x18
chip: SiFive PLIC
flags: 0x0

# cat irqs/22
handler: handle_fasteoi_irq
device: (null)
status: 0x00000400
_IRQ_NOPROBE
istate: 0x00000000
ddepth: 1
wdepth: 0
dstate: 0x02031000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_AFFINITY_SET
IRQD_DEFAULT_TRIGGER_SET
node: 0
affinity: 0-3
domain: :soc:interrupt-controller@c000000
hwirq: 0x18
chip: SiFive PLIC
flags: 0x0

So:
- the GPIO driver seems to have properly told the kernel that
it reports to interrupt-controller@c000000's domain,
on line 24.
- ...but that line (soft irq 22 on this specific boot) is
disabled and masked.
- the da9063 does not seem to have express its reliance on
the gpio's irq domain, but still the gpio irq is enabled

What is abnormal here ?
What could be a probable cause ? PLIC driver ? GPIO driver ?
da9063 driver ? devicetree ?

Side note: the above traces are with a few of my hacks, they
may not fully represent a clean kernel. I did confirm that
the high-level symptom still exists.

Regards,
--
Vincent Pelletier