Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

From: Alex Williamson
Date: Tue Apr 23 2019 - 14:38:33 EST


On Tue, 23 Apr 2019 12:53:07 -0500
Alex G <mr.nuke.me@xxxxxxxxx> wrote:

> On 4/23/19 12:10 PM, Bjorn Helgaas wrote:
> > On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:
> >> On 4/22/19 7:33 PM, Alex Williamson wrote:
> >>> There is nothing wrong happening here that needs to fill logs. I
> >>> thought maybe if I enabled notification of autonomous bandwidth
> >>> changes that it might categorize these as something we could
> >>> ignore, but it doesn't. How can we identify only cases where this
> >>> is an erroneous/noteworthy situation? Thanks,
> >>
> >> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
> >> consistent with every other subsystem that deals with multi-speed links.
> >
> > Can you point me to the logging in these other subsystems so I can
> > learn more about how they deal with this?
>
> I don't have any in-depth articles about the logging in these systems,
> but I can extract some logs from my machines.
>
> Ethernet:
>
> [Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down
> [Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down
> [Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: Rx/Tx
>
> I used to have one of these "green" ethernet switches that went down to
> 100mbps automatically. You can imagine how "clogged" the logs were with
> link up messages. Thank goodness that switch was killed in a thunderstorm.
>
> USB will log every device insertion and removal, very verbosely (see
> appendix A).

I have a hard time putting USB insertion and removal into the same
class, the equivalent is PCI hotplug which is logged separately. Do
we ever log beyond USB device discovery if a device is running at a
lower speed than is possible? The most directly related is the green
ethernet switch, which you admit was a nuisance due to exactly this
sort of logging. It was probably confusing to see this logging, perhaps
you wondered if the cable was bad or the switch was defective.

> > I agree that emitting log messages for normal and expected events will
> > lead to user confusion and we need to do something.
> >
> > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > notification") was merged in v5.1-rc1, so we still have (a little)
> > time to figure this out before v5.1.
>
> I always viewed the system log as a system log, instead of a database of
> system errors. I may have extremist views, but going back to Alex's
> example, I prefer to see that the power saving mechanism is doing
> something to save power on my laptop (I'll just ignore it on a desktop).

There's a disconnect from above where similar behavior on ethernet
behavior "clogged" the log files, but here we just want to ignore it.
Excessive logging can also be considered a denial of service vector
when the device generating that excessive logging is attached to a
userspace driver.

> If you think increasing code complexity because people don't want things
> logged into the system log, then I'm certain we can work out some sane
> solution. It's the same problem we see with GCC, where people want
> warning messages here, but don't want the same messages there.

v5.1 is approaching quickly, can we downgrade these to pci_dbg() while
we work on maybe some sort of driver participation in this logging?
Thanks,

Alex