Re: [BUG] v4.20 - bridge not getting DHCP responses? (works in 4.19.13)

From: Ian Kumlien
Date: Wed Jan 09 2019 - 19:16:37 EST


On Wed, Jan 9, 2019 at 12:17 AM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
> On Wed, Jan 9, 2019, 00:09 Florian Fainelli <f.fainelli@xxxxxxxxx wrote:

[--8<---]

>> > when looking at "git log v4.19...v4.20
>> > drivers/net/ethernet/intel/ixgbe/" nothing else really stands out...
>> > The machine is also running NAT for my home network and all of that
>> > works just fine...
>> >
>> > I started with tcpdump, prooving that packets reached all the way
>> > outside but replies never made it, reboorting
>> > with 4.19.13 resulted in replies appearing in the tcpdump.
>> >
>> > I don't quite know where to look - and what can i do to test - i tried
>> > disabling all offloading (due to the UDP
>> > offloading changes) but nothing helped...
>> >
>> > Ideas? Patches? ;)
>>
>> Running a bisection would certainly help find the offending commit if
>> that is something that you can do?
>
> I was hoping for a likely suspect but this was on my "Todo" for Friday night anyway... (And I already started testing with some patches reversed)

So after lengthy git bisect sections, both from the latest stable i
was using (not the best of ideas)
and from 4.19.

The latest stable yielded 72b0094f918294e6cb8cf5c3b4520d928fbb1a57 -
which is incorrect...

However, the proper bisect gave me this:
fb420d5d91c1274d5966917725e71f27ed092a85 is the first bad commit
commit fb420d5d91c1274d5966917725e71f27ed092a85
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date: Fri Sep 28 10:28:44 2018 -0700

tcp/fq: move back to CLOCK_MONOTONIC

In the recent TCP/EDT patch series, I switched TCP and sch_fq
clocks from MONOTONIC to TAI, in order to meet the choice done
earlier for sch_etf packet scheduler.

But sure enough, this broke some setups were the TAI clock
jumps forward (by almost 50 year...), as reported
by Leonard Crestez.

If we want to converge later, we'll probably need to add
an skb field to differentiate the clock bases, or a socket option.

In the meantime, an UDP application will need to use CLOCK_MONOTONIC
base for its SCM_TXTIME timestamps if using fq packet scheduler.

Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI")
Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI")
Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Reported-by: Leonard Crestez <leonard.crestez@xxxxxxx>
Tested-by: Leonard Crestez <leonard.crestez@xxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

:040000 040000 06615f5ed4486fd0af77a8fb59775a9f2346aebc
7f883c7753cb3d5d881e0edbef2989f4e6db6a1f M include
:040000 040000 767c5e93fe5cfd609f90834d93978511c284ea01
cc47bd361516622c0b21602e188181fdfc6b2995 M net
----

Which could actually be the culprit - I'm having problems *with* UDP
traffic (DHCP) and I am using fq

Lets hope it's so, since this was kinda boring:
ls /lib/modules |grep 4.19.0 |wc -l
27

Testing 4.20.1 and then 4.20.1 with the suspected patch reverted, will
report shortly!