Re: [2.6.35-rc3 regression] TCP connections on 'lo' interfacerandomly stall

From: Randy Dunlap
Date: Mon Jun 14 2010 - 14:24:19 EST


On Mon, 14 Jun 2010 20:10:27 +0200 Thomas Bächler wrote:

> With 2.6.35-rc3, I cannot use TCP over the 'lo' interface any more. This
> is reproduced easily by running 'ssh localhost' and executing a few
> commands inside the ssh session (if you are able to log in, 'ls -lhFR /'
> does a good job) - the connection will stall completely after a very
> short time. As far as I can see, all applications are affected.
>
> It also seems that once a service is "stalled", I cannot open a new
> connection to the same TCP port anymore. However, I can open a
> connection to a different port until that one is stalled, too.
>
> Running wireshark, I can see that TCP retransmissions are sent, but
> never acknowledged.
>
> Bisection (starting with 7908a9e as good and v2.6.35-rc3 as bad) leads
> to the following commit. Please CC me on replies to this issue. Thanks
> for your help.

Does this patch fix it for you?
http://lkml.org/lkml/2010/6/13/155


> commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
> Author: John Fastabend <john.r.fastabend@xxxxxxxxx>
> Date: Thu Jun 3 09:30:11 2010 +0000
>
> net: deliver skbs on inactive slaves to exact matches
>
> Currently, the accelerated receive path for VLAN's will
> drop packets if the real device is an inactive slave and
> is not one of the special pkts tested for in
> skb_bond_should_drop(). This behavior is different then
> the non-accelerated path and for pkts over a bonded vlan.
>
> For example,
>
> vlanx -> bond0 -> ethx
>
> will be dropped in the vlan path and not delivered to any
> packet handlers at all. However,
>
> bond0 -> vlanx -> ethx
>
> and
>
> bond0 -> ethx
>
> will be delivered to handlers that match the exact dev,
> because the VLAN path checks the real_dev which is not a
> slave and netif_recv_skb() doesn't drop frames but only
> delivers them to exact matches.
>
> This patch adds a sk_buff flag which is used for tagging
> skbs that would previously been dropped and allows the
> skb to continue to skb_netif_recv(). Here we add
> logic to check for the deliver_no_wcard flag and if it
> is set only deliver to handlers that match exactly. This
> makes both paths above consistent and gives pkt handlers
> a way to identify skbs that come from inactive slaves.
> Without this patch in some configurations skbs will be
> delivered to handlers with exact matches and in others
> be dropped out right in the vlan path.
>
> I have tested the following 4 configurations in failover modes
> and load balancing modes.
>
> # bond0 -> ethx
>
> # vlanx -> bond0 -> ethx
>
> # bond0 -> vlanx -> ethx
>
> # bond0 -> ethx
> |
> vlanx -> --
>
> Signed-off-by: John Fastabend <john.r.fastabend@xxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/