Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts

From: Nam Cao
Date: Sun Mar 17 2024 - 17:27:39 EST


Cc: HID folks

On 14/Mar/2024 Eva Kurchatova wrote:
> If an I2C-HID controller level-triggered IRQ line is routed directly as
> a PLIC IRQ, and we spam input early enough in kernel boot process
> (Somewhere between initializing NET, ALSA subsystems and before
> i2c-hid driver init), then there is a chance of kernel locking up
> completely and not going any further.
>
> There are no kernel messages printed with all the IRQ, task hang
> debugging enabled - other than (sometimes) it reports sched RT
> throttling after a few seconds. Basic timer interrupt handling is
> intact - fbdev tty cursor is still blinking.
>
> It appears that in such a case the I2C-HID IRQ line is raised; PLIC
> notifies the (single) boot system hart, kernel claims the IRQ and
> immediately completes it by writing to CLAIM/COMPLETE register.
> No access to the I2C controller (OpenCores) or I2C-HID registers
> is made, so the HID report is never consumed and IRQ line stays
> raised forever. The kernel endlessly claims & completes IRQs
> without doing any work with the device. It doesn't always end up this
> way; sometimes boot process completes and there are no signs of
> interrupt storm or stuck IRQ processing afterwards.

It seems I2C HID's interrupt handler (i2c_hid_irq) returns immediately if
I2C_HID_READ_PENDING is set. This flag is supposed to be cleared in
i2c_hid_xfer(), but since the (threaded) interrupt handler runs at higher
priority, the flag is never cleared. So we have a lock-up: interrupt
handler won't do anything unless the flag is cleared, but the clearing of
this flag is done in a lower priority task which never gets scheduled while
the interrupt handler is active.

There is RT throttling to prevent RT tasks from locking up the system like
this. I don't know much about scheduling stuffs, so I am not really sure
why RT throttling does not work. I think because RT throttling triggers
when RT tasks take too much CPU time, but in this case hard interrupt
handlers take lots of CPU time too (~50% according to my measurement), so
RT throttling doesn't trigger often enough (in this case, it triggers once
and never again). Again, I don't know much about scheduler so I may be
talking nonsense here.

The flag I2C_HID_READ_PENDING seems to be used to make sure that only 1
I2C operation can happen at a time. But this seems pointless, because I2C
subsystem already takes care of this. So I think we can just remove it.

Can you give the below patch a try?

diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
index 2735cd585af0..799ad0ef9c4a 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -64,7 +64,6 @@
/* flags */
#define I2C_HID_STARTED 0
#define I2C_HID_RESET_PENDING 1
-#define I2C_HID_READ_PENDING 2

#define I2C_HID_PWR_ON 0x00
#define I2C_HID_PWR_SLEEP 0x01
@@ -190,15 +189,10 @@ static int i2c_hid_xfer(struct i2c_hid *ihid,
msgs[n].len = recv_len;
msgs[n].buf = recv_buf;
n++;
-
- set_bit(I2C_HID_READ_PENDING, &ihid->flags);
}

ret = i2c_transfer(client->adapter, msgs, n);

- if (recv_len)
- clear_bit(I2C_HID_READ_PENDING, &ihid->flags);
-
if (ret != n)
return ret < 0 ? ret : -EIO;

@@ -566,9 +560,6 @@ static irqreturn_t i2c_hid_irq(int irq, void *dev_id)
{
struct i2c_hid *ihid = dev_id;

- if (test_bit(I2C_HID_READ_PENDING, &ihid->flags))
- return IRQ_HANDLED;
-
i2c_hid_get_input(ihid);

return IRQ_HANDLED;