Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX

From: Jakub Kicinski
Date: Thu Nov 05 2020 - 15:42:09 EST


On Thu, 05 Nov 2020 11:23:54 -0800 Saeed Mahameed wrote:
> If you report an error without recovering, devlink health will report a
> bad device state
>
> $ ./devlink health
> pci/0002:01:00.0:
> reporter npa
> state error error 1 recover 0

Actually, the counter in the driver is unnecessary, right? Devlink
counts errors.

> So you will need to implement an empty recover op.
> so if these events are informational only and they don't indicate
> device health issues, why would you report them via devlink health ?

I see devlink health reporters a way of collecting errors reports which
for the most part are just shared with the vendor. IOW firmware (or
hardware) bugs.

Obviously as you say without recover and additional context in the
report the value is quite diminished. But _if_ these are indeed "report
me to the vendor" kind of events then at least they should use our
current mechanics for such reports - which is dl-health.

Without knowing what these events are it's quite hard to tell if
devlink health is an overkill or counter is sufficient.

Either way - printing these to the logs is definitely the worst choice
:)