Re: BUG: Bisected Gianfar in bridge not forwarding packets (was:3.0-rc1 Bridge not forwarding unicast packages)

From: Michael Guntsche
Date: Wed Aug 10 2011 - 07:52:00 EST


On Wed, Aug 10, 2011 at 1:19 PM, Jiri Pirko <jpirko@xxxxxxxxxx> wrote:
> Wed, Aug 10, 2011 at 12:30:16PM CEST, mguntsche@xxxxxxxxx wrote:
>>On Wed, Aug 10, 2011 at 11:59 AM, Michael Guntsche <mguntsche@xxxxxxxxx> wrote:
>><snip>
>>>>>Offload parameters for lan_wire:
>>>>>rx-checksumming: off
>>>>>tx-checksumming: off
>>>>>scatter-gather: off
>>>>>tcp-segmentation-offload: off
>>>>>udp-fragmentation-offload: off
>>>>>generic-segmentation-offload: off
>>>>>generic-receive-offload: on
>>>>>large-receive-offload: off
>>>>>rx-vlan-offload: off
>>>>>tx-vlan-offload: off
>>>>>ntuple-filters: off
>>>>>receive-hashing: off
>>>>>
>>>>>The Bridge device on the other hand....
>>>>>
>>>>>Offload parameters for lan:
>>>>>rx-checksumming: on
>>>>>tx-checksumming: on
>>>>>scatter-gather: off
>>>>>tcp-segmentation-offload: off
>>>>>udp-fragmentation-offload: off
>>>>>generic-segmentation-offload: off
>>>>>generic-receive-offload: on
>>>>>large-receive-offload: off
>>>>>rx-vlan-offload: off
>>>>>tx-vlan-offload: on
>>>>  ^^^^^^^^^^^^^^^^^^^ is this gfar?
>>> No the "Lan" nic is the bridge itself. The gfar in question is lan_wire.
>>>
>>> /Michael
>>>
>>
>>Ok I would have saved hours of bisecting if I had just used the -e
>>switch with tcpdump from the beginning.
>>Jiri first of all the patch makes the connection work again. I can
>>ping the client on the wlan from the server and vice-versa. Taking a
>>look at the tcpdump (WITH -e) makes it obvious why it fails with the
>>non patched version...
>>
>>This is a capture from the gfar lan port on the bridge with no patch applied
>>12:13:24.011492 00:13:d4:4f:a2:dc (oui Unknown) > b4:07:f9:70:b7:c1
>>(oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 19, p 0,
>>ethertype IPv4, gibson.comsick.at > 192.168.42.55: ICMP echo request,
>>id 23567, seq 74, length 64
>>
>>As you can see we get a VLAN package????
>
> Ugh, this is what I expected. Patch to fix:
>
> Subject: [patch net-2.6] gianfar: prevent buggy hw rx vlan tagging
>
> On some buggy chips, "vlan tag present" flag is set which causes packet
> loss. Fix this by checking if rx vlan accel is enabled in features.
>
> Reported-by: Michael Guntsche <mguntsche@xxxxxxxxx>
> Signed-off-by: Jiri Pirko <jpirko@xxxxxxxxxx>
> ---
>  drivers/net/gianfar.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 2659daa..31d5c57 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2710,8 +2710,13 @@ static int gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
>        /* Tell the skb what kind of packet this is */
>        skb->protocol = eth_type_trans(skb, dev);
>
> -       /* Set vlan tag */
> -       if (fcb->flags & RXFCB_VLN)
> +       /*
> +        * There's need to check for NETIF_F_HW_VLAN_RX here.
> +        * Even if vlan rx accel is disabled, on some chips
> +        * RXFCB_VLN is pseudo randomly set.
> +        */
> +       if (dev->features & NETIF_F_HW_VLAN_RX &&
> +           fcb->flags & RXFCB_VLN)
>                __vlan_hwaccel_put_tag(skb, fcb->vlctl);
>
>        /* Send the packet up the stack */
> --
> 1.7.6
>
>

Jiri, there seems to be another bug lingering in the bridge code,
which might cause this problem in the first place but I am not really
sure. I looked at the ethtool output some more and I noticed that some
features were enabled on the Bridge which should be off.

With 3.1-rc1
Offload parameters for lan (This is the bridge interface itself):
rx-checksumming: on <---- ON
tx-checksumming: on <---- ON
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: on <---- ON

I booted an older kernel on the same hardware (2.6.39) and the output differs.

Offload parameters for lan:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off

As you can see all the values are OFF which is correct, since no
attached interface supports these features.
Is it possible that this is tripping up the now changed gianfar code somehow?
What I am trying to say. Could it be that your patch is hiding a
problem that is now surfacing with a change in the bridge code?

I am thinking about this commit:
c4d27ef95: bridge: convert br_features_recompute() to ndo_fix_features

Of course this could also be a totally separate bug as well.

Any thoughts, Dave, Stephen?

Kind regards,
Michael Guntsche
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/