Re: [RFC 1/2] devlink: add simple fw crash helpers

From: Ben Greear
Date: Mon May 25 2020 - 13:09:16 EST




On 05/25/2020 02:07 AM, Andy Shevchenko wrote:
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:

I had to go RTFM re: kernel taints because it has been a very long
time since I looked at them. It had always seemed to me that most were
caused by "kernel-unfriendly" user actions. The most famous of course
is loading proprietary modules, out-of-tree modules, forced module
loads, etc... Honestly, I had forgotten the large variety of uses of
the taint flags. For anyone who hasn't looked at taints recently, I
recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

In light of this I don't object to setting a taint on this anymore.
I'm a little uneasy, but I've softened on it now, and now I feel it
depends on implementation.

Specifically, I don't think we should set a taint flag when a driver
easily handles a routine firmware crash and is confident that things
have come up just fine again. In other words, triggering the taint in
every driver module where it spits out a log comment that it had a
firmware crash and had to recover seems too much. Sure, firmware
shouldn't crash, sure it should be open source so we can fix it,
whatever...

While it may sound idealistic the firmware for the end-user, and even for mere
kernel developer like me, is a complete blackbox which has more access than
root user in the kernel. We have tons of firmwares and each of them potentially
dangerous beast. As a user I really care about my data and privacy (hacker can
oops a firmware in order to set a specific vector attack). So, tainting kernel
is _a least_ we can do there, the strict rules would be to reboot immediately.

those sort of wishful comments simply ignore reality and
our ability to affect effective change.

We can encourage users not to buy cheap crap for the starter.

There is no stable wifi firmware for any price.

There is also no obvious feedback from even name-brand NICs like ath10k or AX200
when you report a crash.

That said, at least in my experience with ath10k-ct, the OS normally recovers fine
from firmware crashes. ath10k already reports full crash reports on udev, so
easy for user-space to notice and report bug reports upstream if it cares to. Probably
other NICs do the same, and if not, they certainly could.

Thanks,
Ben


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com