gateway icmp redirect handling problem (3.0.36-3.0.23)

From: Simon Roscic
Date: Fri Jul 20 2012 - 18:52:13 EST


Hello,

I'm experiencing the following problem with kernel versions 3.0.36 (down to 3.0.23):

on our network we all have one default gateway, it's 10.1.1.254, but there are some networks for which we have another gateway and for this networks the default gateway sends an icmp redirect.

lets assume my test machine has ip 10.1.20.79 netmask is 255.255.0.0 and my default gateway is 10.1.1.254, i now ping the following ip: 10.109.98.11, my default gateway (10.1.1.254) now sends me an icmp redirect to another gateway (10.1.1.1) ... and now everything works as expected, i get the replies from 10.109.98.11 but not for long, after approx. 60 (or so) seconds i only get "ping: sendmsg: Network is down".
(exact same problem with all other tcp/udp protocols, but i used ping for the tests because it also prints the redirect messages to the console)

so let's have a closer look:

not ok - kernel versions 3.0.36 down to 3.0.23:
-----------------------------------------------

test-simon:~ # ping 10.109.98.10
...
64 bytes from 10.109.98.11: icmp_seq=62 ttl=60 time=12.1 ms
64 bytes from 10.109.98.11: icmp_seq=63 ttl=60 time=11.6 ms
ping: sendmsg: Network is down
ping: sendmsg: Network is down

when looking at "ip neigh" the "ping: sendmsg: Network is down" message appears in the exact moment when the arp entry for the default gateway (10.1.1.254) gets removed from the arp cache:

ping "OK"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

ping "dead"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

so it seems that when the default gateway is removed from the arp cache something goes wrong in the kernel route handling. i don't know the internals of the linux route handling, now i need your help, any ideas what's going wrong?

i did a lot of tests, the problem i described first happens with kernel version 3.0.23, i found in the changelog of 3.0.23 the following two commits:
(http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.0.23)

commit 42ab5316ddcaa0de23e88e8a3d363c767b9ab0b3
Author: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Date: Fri Nov 18 15:24:32 2011 -0500
ipv4: fix redirect handling

commit bebee22bcbf0026f92141990972bd5863ef9b69c
Author: Flavio Leitner <fbl@xxxxxxxxxx>
Date: Mon Oct 24 02:56:38 2011 -0400
route: fix ICMP redirect validation

i then took the net/ipv4/route.c file from kernel 3.0.22 and replaced the version in 3.0.23 with it, this reverts the two mentioned patches above (if i havent overlooked something) after that the problem disappears.
so those two patches surely fixed some problem but for kernel versions 3.0.23-3.0.36 they broke the gateway icmp redirect handling as described by me here.

i did some further tests with different kernel versions:
3.5-rc6: OK
3.4.4: OK
3.2.22: OK
3.0.1 - 3.0.22: OK
3.0.23 - 3.0.36: not OK
2.6.35.13: OK

now lets have a closer look at a kernel version which works:
------------------------------------------------------------

this is from 3.5-rc6, but 3.4.4, 3.2.2 and 2.6.35.13 also behave exactly this way, 3.0.1-3.0.22 behave slightly different, see note below.

test-simon:~ # ping 10.109.98.11
PING 10.109.98.10 (10.109.98.11) 56(84) bytes of data.
64 bytes from 10.109.98.11: icmp_seq=1 ttl=60 time=15.2 ms
From 10.1.1.254: icmp_seq=2 Redirect Host(New nexthop: 10.1.1.1)
...

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

and after approx 60 or so seconds:

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

and ping (and everything else) is as expected still working.

note:
-----

on 3.0.1-3.0.22:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) to my test machine, while running tcpdump on the default gateway (10.1.1.254) i see every ping packet also arriving there and also some icmp redirect messages going out to my test machine.
but everything works so i think my test machine is correctly talking to the destination using the other gateway (10.1.1.1).
i also sniffed a windows 7 client pc, it looks the same there, so possibly no problem, but i mention this because kernel versions 3.5-rc6, 3.4.4, 3.2.22 and 2.6.35.13 act differently (see below).

on 3.0.23-3.0.36:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) to my test machine, while running tcpdump on the default gateway (10.1.1.254) i see up to 20 ping packets arriving there and also up to 17 icmp redirect messages going out to my test machine, after the 20th ping packet i dont see further ping packets arriving at the default gateway. so my test machine is then only talking to the other gateway (10.1.1.1) i think.
...
17:48:41.643952 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 to host 10.1.1.1, length 92
...
17:48:44.649008 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 30733, seq 20, length 64
17:48:44.649018 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 30733, seq 20, length 64

on 3.5-rc6, 3.4.4, 3.2.22 and 2.6.35.13:

here it looks different, and for me this is the expected behavior, or at least the behavior i have seen from lots of linux machines on my network. i see 1-2 icmp redirects sent from the default gateway (10.1.1.254) to my test machine, while running tcpdump on the default gateway (10.1.1.254) i only see up to 2 ping packets arriving then nothing, so then my test machine seems to only talk to the other gateway (10.1.1.1).

17:50:58.995894 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 10766, seq 1, length 64
17:50:58.995914 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 10766, seq 1, length 64
17:50:59.997260 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 10766, seq 2, length 64
17:50:59.997277 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 to host 10.1.1.1, length 92
17:50:59.997287 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 10766, seq 2, length 64

...

(before someone asks why i "must" use kernel 3.0.x ... because this are SLES 11 SP2 VMs and they currently ship kernel 3.0.34)

i hope i described the problem in a way so that the kernel network stack maintainers can understand the problem, please conact me if you have further questions, and please CC me as i am not subscribed to linux-kernel. this message is already on linux-netdev, if you wish you can CC your answer also there.

kind regards,
Simon Roscic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/