Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt

From: Florian Westphal
Date: Fri Feb 08 2019 - 02:07:18 EST


Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote:
> L.S.,
>
> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
> (using an nftables firewall with NAT and connection tracking).
>
> Unfortunately it isn't too obvious since no errors are logged, but on clients it
> causes symptoms like firefox intermittently not being able to load pages with:
> Network Protocol Error
> An error occurred during a connection to www.example.com
> The page you are trying to view cannot be shown because an error in the network protocol was detected.
> Please contact the website owners to inform them of this problem.
>
> But it's only intermittently, so i can still visit some webpages with clients,
> could be that packet size and or fragments are at play ?
>
> So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with
> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
> but to no avail.
>
> After that I tried to git bisect and ended up with:
>
> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
> commit faec18dbb0405c7d4dda025054511dc3a6696918
> Author: Florian Westphal <fw@xxxxxxxxx>
> Date: Thu Dec 13 16:01:33 2018 +0100
>
> netfilter: nat: remove l4proto->manip_pkt

Thanks, this is immensely helpful.

I think I see the bug, we can't use target->dst.protonum in
nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
with a related icmp packet.

I will send a patch in a few hours when I get back.