Re: CAKE and r8169 cause panic on upload in v4.19

From: Dave Taht
Date: Fri Oct 26 2018 - 16:25:48 EST


On Fri, Oct 26, 2018 at 1:21 PM Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
>
> On 26.10.2018 21:26, Oleksandr Natalenko wrote:
> > Hello.
> >
> > I was excited regarding the fact that v4.19 introduced CAKE, so I've deployed it on my home router.
> >
> > I used this script of mine [1]:
> >
> > # bufferbloat enp3s0.100 20 20
> >
> > to do its job on the VLAN interface, where 20/20 ISP link is switched from the home switch. Basically, it just follows [2] with simple bandwidth restriction and egress mirroring using ifb.
> >
> > Then I thought it would be nice to run speedtest-cli on one of the computer in the home LAN, connected to this router. Download stage went fine, but immediately after upload started I've got a panic on the router: [3] (sorry, it is a photo, netconsole didn't work because, I assume, the panic happened in the networking code). I rebooted the router and tried once more, and got the same result, again during upload stage. Then I rebooted again, replaced CAKE script with my former HTB script, and after running speedtest-cli a couple of times there's no panic.
> >
> > Before running speedtest-cli I was using CAKE for a couple of days without generating much traffic just fine. It seems it crashes only if lots of traffic is generated with tools like this.
> >
> > My sysctl: [4] and ethtool -k: [5]
> >
> > So far, I've found something similar only here: [6] [7]. The common thing is r8169 driver in use, so, maybe, it is a driver issue, and CAKE is just happy to reveal it.
> >
> > If it is something known, please point me to a possible fix. If it is something new, I'm open to provide more info on your request, try patches etc (as usual).
> >
> It seems to be the same problem as described here: https://bugzilla.kernel.org/show_bug.cgi?id=201063
> As I commented in bugzilla, the GPF in dev_hard_start_xmit and the values of R12/R15 make me think
> that a poisoned list pointer is accessed. It's so deep in the network stack that I can not really
> imagine the network driver is to blame. One screenshot attached to the bug report shows that the
> GPF also happened with the igb driver. Most likely we find out only once somebody spends effort
> on bisecting the issue.
> d4546c2509b1 ("net: Convert GRO SKB handling to list_head.") and some subsequent changes deal with
> skb list processing, maybe the issue is related to one of these changes.

Can you repeat your test, disabling gro splitting in cake?

the option is "no-split-gso"

>
> > Thanks.
> >



--

Dave TÃht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740