Unexpected timestamps in tcpdump with veth + tc qdisc netem delay

From: Henrique de Moraes Holschuh
Date: Mon Apr 26 2021 - 10:36:48 EST


(please CC me in any replies, thank you!)

Hello,

While trying to simulate large delay links using veth and netns, I came across what looks like unexpected / incorrect behavior.

I have reproduced it in Debian 4.19 and 5.10 kernels, and a quick look at mainline doesn't show any relevant deviation from Debian kernels to mainline in my limited understanding of this area of the kernel.

I have attached a simple script to reproduce the scenario. If my explanation below is not clear, please just look at the script to see what it does: it should be trivial to understand. It needs tcpdump, and CAP_NET_ADMIN (or root, etc).

Topology

root netns:
veth vec0 (192.168.233.1) paired to ves0 (192.168.233.2)
tc qdisc dev vec0 root netem delay 250ms

lab500ms netns:
veth ves0 (192.168.233.2), paired to vec0 (192.168.233.1)
tc qdisc dev ves0 root netem delay 250ms

So:
[root netns -- veth (tc qdisc netem delay 250ms) ] <> [ veth (tc qdisc netem delay 250ms) -- lab500ms netns ]

Expected RTT from a packet roundtrip (root nets -> lab500ms netns -> root netns) is 500ms.


The problem:

[root netns]: ping 192.168.233.2
PING 192.168.233.2 (192.168.233.2) 56(84) bytes of data.
64 bytes from 192.168.233.2: icmp_seq=1 ttl=64 time=500 ms
64 bytes from 192.168.233.2: icmp_seq=2 ttl=64 time=500 ms

(the RTT reported by ping is 500ms as expected: there is a 250ms transmit delay attached to each member of the veth pair)

However:

[root netns]: tcpdump -i vec0 -s0 -n -p net 192.168.233.0/30
listening on vec0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:09:09.740681 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
17:09:09.990891 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
17:09:10.741903 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
17:09:10.992031 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
17:09:11.742813 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
17:09:11.993009 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64

[lab500ms netns]: ip netns exec lab500ms tcpdump -i ves0 -s0 -n -p net 192.168.233.0/30
listening on ves0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:09:09.740724 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
17:09:09.990867 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
17:09:10.741942 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
17:09:10.992012 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
17:09:11.742851 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
17:09:11.992985 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64

One can see that the timestamps shown by tcpdump (also reproduced using wireshark) are *not* what one would expect: the 250ms delays are missing in incoming packets (i.e. there's 250ms missing from timestamps in packets "echo reply" in vec0, and "echo request" in ves0).

The 250ms vec0->ves0 delay AND 250ms ves0 -> vec0 delay *are* there, as shown by "ping", but you'd not know it if you look at the tcpdump. The timing shown in tcpdump looks more like packet injection time at the first interface, than the time the packet was "seen" at the other end (capture interface).

Adding more namespaces and VETH pairs + routing "in a row" so that the packet "exits" one veth tunnel and enters another one (after trivial routing) doesn't fix the tcpdump timestamps in the capture at the other end of the veth-veth->routing->veth-veth->routing->... chain.

It looks like some sort of bug to me, but maybe I am missing something, in which case I would greatly appreciate an explanation of where I went wrong...

Thanks in advance,
Henrique de Moraes Holschuh <hmh@xxxxxxxxxx>

Attachment: netns.sh
Description: Bourne shell script