Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()
From: Luis Chamberlain
Date: Mon May 18 2020 - 13:09:38 EST
On Mon, May 18, 2020 at 09:58:53AM -0700, Ben Greear wrote:
>
>
> On 05/18/2020 09:51 AM, Luis Chamberlain wrote:
> > On Sat, May 16, 2020 at 03:24:01PM +0200, Johannes Berg wrote:
> > > On Fri, 2020-05-15 at 21:28 +0000, Luis Chamberlain wrote:> module_firmware_crashed
> > >
> > > You didn't CC me or the wireless list on the rest of the patches, so I'm
> > > replying to a random one, but ...
> > >
> > > What is the point here?
> > >
> > > This should in no way affect the integrity of the system/kernel, for
> > > most devices anyway.
> >
> > Keyword you used here is "most device". And in the worst case, *who*
> > knows what other odd things may happen afterwards.
> >
> > > So what if ath10k's firmware crashes? If there's a driver bug it will
> > > not handle it right (and probably crash, WARN_ON, or something else),
> > > but if the driver is working right then that will not affect the kernel
> > > at all.
> >
> > Sometimes the device can go into a state which requires driver removal
> > and addition to get things back up.
>
> It would be lovely to be able to detect this case in the driver/system
> somehow! I haven't seen any such cases recently,
I assure you that I have run into it. Once it does again I'll report
the crash, but the problem with some of this is that unless you scrape
the log you won't know. Eventually, a uevent would indeed tell inform
me.
> but in case there is
> some common case you see, maybe we can think of a way to detect it?
ath10k is just one case, this patch series addresses a simple way to
annotate this tree-wide.
> > > So maybe I can understand that maybe you want an easy way to discover -
> > > per device - that the firmware crashed, but that still doesn't warrant a
> > > complete kernel taint.
> >
> > That is one reason, another is that a taint helps support cases *fast*
> > easily detect if the issue was a firmware crash, instead of scraping
> > logs for driver specific ways to say the firmware has crashed.
>
> You can listen for udev events (I think that is the right term),
> and find crashes that way. You get the actual crash info as well.
My follow up to this was to add uevent to add_taint() as well, this way
these could generically be processed by userspace.
Luis