Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"

From: Guenter Roeck
Date: Tue Aug 03 2021 - 11:27:21 EST


On 8/3/21 8:01 AM, Jan Kiszka wrote:
On 03.08.21 16:59, Jan Kiszka wrote:
On 03.08.21 16:51, Jean Delvare wrote:
Hi all,

Commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on
second timeout") causes a regression on several systems. Symptoms are:
system reboots automatically after a short period of time if watchdog
is enabled (by systemd for example). This has been reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213809

Unfortunately this commit was backported to all stable kernel branches
(4.14, 4.19, 5.4, 5.10, 5.12 and 5.13). I'm not sure why that is the
case, BTW, as there is no Fixes tag and no Cc to stable@vger either.
And the fix is not trivial, has apparently not seen enough testing,
and addresses a problem that has a known and simple workaround. IMHO it
should never have been accepted as a stable patch in the first place.
Especially when the previous attempt to fix this issue already ended
with a regression and a revert.

Anyway... After a glance at the patch, I see what looks like a nice
thinko:

+ if (p->smi_res &&
+ (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))

The author most certainly meant inl(SMI_EN(p)) (the register's value)
and not SMI_EN(p) (the register's address).


Yes, shame on me that I didn't see that.


https://lkml.org/lkml/2021/7/26/349


That's for the fix (in line with your analysis).

I was also wondering if backporting that quickly was needed. Didn't
propose it, though.


I'd suggest to discuss that with Greg and Sasha. Backporting is pretty
aggressive nowadays.

Guenter