Re: [RFC] [PATCH net-next v6 3/3] r8169: Implement dynamic ASPM mechanism

From: Bjorn Helgaas
Date: Fri Oct 08 2021 - 09:58:27 EST


On Fri, Oct 08, 2021 at 02:18:55PM +0800, Kai-Heng Feng wrote:
> On Fri, Oct 8, 2021 at 3:11 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Fri, Oct 08, 2021 at 12:15:52AM +0800, Kai-Heng Feng wrote:
> > > r8169 NICs on some platforms have abysmal speed when ASPM is enabled.
> > > Same issue can be observed with older vendor drivers.
> > >
> > > The issue is however solved by the latest vendor driver. There's a new
> > > mechanism, which disables r8169's internal ASPM when the NIC traffic has
> > > more than 10 packets per second, and vice versa. The possible reason for
> > > this is likely because the buffer on the chip is too small for its ASPM
> > > exit latency.
> > > ...

> > I suppose that on the Intel system, if we enable ASPM, the link goes
> > to L1.2, and the NIC immediately receives 1000 packets in that second
> > before we can disable ASPM again, we probably drop a few packets?
> >
> > Whereas on the AMD system, we probably *never* drop any packets even
> > with L1.2 enabled all the time?
>
> Yes and yes.

The fact that we drop some packets with dynamic ASPM on the Intel
system means we must be giving up some performance.

And I guess that on the AMD system, we should get full performance but
we must be using a little more power (probably unmeasurable) because
ASPM *could* be always enabled but dynamic ASPM disables it some of
the time.

> > And if we actually knew the root cause and could set the correct LTR
> > values or whatever is wrong on the Intel system, we probably wouldn't
> > need this dynamic scheme?
>
> Because Realtek already implemented the dynamic ASPM workaround in
> their Windows and Linux driver, they never bother to find the root
> cause.
> So we'll never know what really happens here.

Looks like it. Somebody with a PCIe analyzer could probably make
progress, but I agree, that doesn't seem likely.

Realtek no doubt has the equipment to do this, but apparently they
don't think it's worthwhile. In their defense, the Linux ASPM code is
pretty impenetrable and there could be a problem there that causes or
contributes to this.

Bjorn