Re: [PATCH v3 0/8] kernel: taint when the driver firmware crashes
From: Luis Chamberlain
Date: Tue May 26 2020 - 23:19:23 EST
On Tue, May 26, 2020 at 04:30:31PM -0700, Jakub Kicinski wrote:
> On Tue, 26 May 2020 23:07:48 +0000 Luis Chamberlain wrote:
> > On Tue, May 26, 2020 at 03:46:06PM -0700, Jakub Kicinski wrote:
> > > On Tue, 26 May 2020 14:58:07 +0000 Luis Chamberlain wrote:
> > > > To those new on CC -- this is intended to be a simple generic interface
> > > > to the kernel to annotate when the firwmare has crashed leaving the
> > > > driver or system in a questionable state, in the worst case requiring
> > > > full system reboot. This series is first addressing only a few
> > > > networking patches, however, I already have an idea of where such
> > > > firmware crashes happen across the tree. The goal with this series then
> > > > is to first introduce the simple framework, and only if that moves
> > > > forward will I continue to chug on with the rest of the drivers /
> > > > subsystems.
> > > >
> > > > This is *not* a networking specific problem only.
> > > >
> > > > This v3 augments the last series by introducing the uevent for panic
> > > > events, one of them is during tainting. The uvent mechanism is
> > > > independent from any of this firmware taint mechanism. I've also
> > > > addressed Jessica Yu's feedback. Given I've extended the patches a bit
> > > > with other minor cleanup which checkpatch.pl complains over, and since
> > > > this infrastructure is still being discussed, I've trimmed the patch
> > > > series size to only cover drivers for which I've received an Acked-by
> > > > from the respective driver maintainer, or where we have bug reports to
> > > > support such dire situations on the driver such as ath10k.
> > > >
> > > > During the last v2 it was discussed that we should instead use devlink
> > > > for this work, however the initial RFC patches produced by Jakub
> > > > Kicinski [0] shows how devlink is networking specific, and the intent
> > > > behind this series is to produce simple helpers which can be used by *any*
> > > > device driver, for any subsystem, not just networking. Subsystem
> > > > specific infrastructure to help address firwmare crashes may still make
> > > > sense, however that does not mean we *don't* need something even more
> > > > generic regardless of the subsystem the issue happens on. Since uevents
> > > > for taints are exposed, we now expose these through uapi as well, and
> > > > that was something which eventually had to happen given that the current
> > > > scheme of relying on sensible character representations for each taint
> > > > will not scale beyond the alphabet.
> > >
> > > Nacked-by: Jakub Kicinski <kuba@xxxxxxxxxx>
> >
> > Care to elaborate?
>
> I elaborated in the previous thread
No you didn't.
> and told you I will nack this,
That's all you said.
> but sure let's go over this again.
>
> For the third time saying the devlink is networking specific is not
> true. It was created as a netlink configuration channel for devices
> when there is no networking reference that could be used. It can be
> compiled in or out much like sysfs.
Perhaps I didn't get your email but this clarification was in no way
shape or form present in your reply on that thread.
> And as I've shown you devlink already has the uAPI for what you're
> trying to achieve.
I read your patch, and granted, I will accept I was under the incorrect
assumption that this can only be used by networking devices, however it
the devlink approach achieves getting userspace the ability with
iproute2 devlink util to query a device health, on to which we can peg
firmware health. But *this* patch series is not about health status and
letting users query it, its about a *critical* situation which has come up
with firmware requiring me to reboot my system, and the lack of *any*
infrastructure in the kernel today to inform userspace about it.
So say we use netlink to report a critical health situation, how are we
informing userspace with your patch series about requring a reboot?
Luis