Re: [RFC 1/2] devlink: add simple fw crash helpers

From: Andy Shevchenko
Date: Mon May 25 2020 - 05:07:58 EST


On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
> On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:

> I had to go RTFM re: kernel taints because it has been a very long
> time since I looked at them. It had always seemed to me that most were
> caused by "kernel-unfriendly" user actions. The most famous of course
> is loading proprietary modules, out-of-tree modules, forced module
> loads, etc... Honestly, I had forgotten the large variety of uses of
> the taint flags. For anyone who hasn't looked at taints recently, I
> recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html
>
> In light of this I don't object to setting a taint on this anymore.
> I'm a little uneasy, but I've softened on it now, and now I feel it
> depends on implementation.
>
> Specifically, I don't think we should set a taint flag when a driver
> easily handles a routine firmware crash and is confident that things
> have come up just fine again. In other words, triggering the taint in
> every driver module where it spits out a log comment that it had a
> firmware crash and had to recover seems too much. Sure, firmware
> shouldn't crash, sure it should be open source so we can fix it,
> whatever...

While it may sound idealistic the firmware for the end-user, and even for mere
kernel developer like me, is a complete blackbox which has more access than
root user in the kernel. We have tons of firmwares and each of them potentially
dangerous beast. As a user I really care about my data and privacy (hacker can
oops a firmware in order to set a specific vector attack). So, tainting kernel
is _a least_ we can do there, the strict rules would be to reboot immediately.

> those sort of wishful comments simply ignore reality and
> our ability to affect effective change.

We can encourage users not to buy cheap crap for the starter.

> A lot of WiFi firmware crashes
> and for well-known cases the drivers handle them well. And in some
> cases, not so well and that should be a place the driver should detect
> and thus raise a red flag. If a WiFi firmware crash can bring down
> the kernel, there's either a major driver bug or some very funky
> hardware crap going on. That sort of thing we should be able to
> detect, mark with a taint (or something), and fix if within our sphere
> of influence. I guess what it comes down to me is how aggressive we
> are about setting the flag.

--
With Best Regards,
Andy Shevchenko