Kernel hang caused by commit "can: m_can: Start/Cancel polling timer together with interrupts"

From: Matthias Schiffer
Date: Tue Jun 18 2024 - 12:17:17 EST


Hi Markus,

we've found that recent kernels hang on the TI AM62x SoC (where no m_can interrupt is available and
thus the polling timer is used), always a few seconds after the CAN interfaces are set up.

I have bisected the issue to commit a163c5761019b ("can: m_can: Start/Cancel polling timer together
with interrupts"). Both master and 6.6 stable (which received a backport of the commit) are
affected. On 6.6 the commit is easy to revert, but on master a lot has happened on top of that
change.

As far as I can tell, the reason is that hrtimer_cancel() tries to cancel the timer synchronously,
which will deadlock when called from the hrtimer callback itself (hrtimer_callback -> m_can_isr ->
m_can_disable_all_interrupts -> hrtimer_cancel).

I can try to come up with a fix, but I think you are much more familiar with the driver code. Please
let me know if you need any more information.

Best regards,
Matthias


--
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/