Re: Routing loops & TTL tracking with tunnel devices

From: Hannes Frederic Sowa
Date: Mon Nov 16 2015 - 17:25:13 EST

Hi Jason,

On Mon, Nov 16, 2015, at 21:14, Jason A. Donenfeld wrote:
> A few tunnel devices, like geneve or vxlan, are using
> udp_tunnel_xmit_skb, or related functions for transmitting packets,
> and are doing the usual FIB lookup to get the dst entry. I see a lot
> of code like this:
> if (rt-> == dev) {
> netdev_dbg(dev, "circular route to %pI4\n",
> &dst->sin.sin_addr.s_addr);
> dev->stats.collisions++;
> goto rt_tx_error;
> }
> This one is from vxlan, but there are other similar blocks elsewhere.
> The basic idea is "am I about to send this packet to my own device?"
> This is a bit crude. For starters, two interfaces could be pointed at
> each other, bouncing the packet back and forth indefinitely, causing
> the feared routing loop. Hopefully as more headers got tacked on,
> allocations would eventually fail, and the queen would be saved.
> But what about in devices for which self-routing might actually be
> useful? For example, let's say that if an incoming skb is headed for
> dst X, it gets encapsulated and sent to dst A, and for dst Y it gets
> encapsulated and sent to dst B, and for dst Z it gets encapsulated and
> sent to dst C. I can imagine situations in which setting A==Y and B==Z
> might be useful to do multiple levels of encapsulation on one device,
> so that skbs headed for dst X get sent to dst C, but with intermediate
> transformations of dst A and dst B.
> This isn't merely theoretical. I'm working on a driver right now that
> could benefit from this.
> So, in implementing this, the question of avoiding routing loops comes
> into play. The most straight forward way to do this is to use a TTL
> value that's decreased. But we have a problem. A packet sent to dst X
> that is encapsulated and sent to dst A will have a ttl calculated for
> its journey to dst A. How do we preserve TTLs across multiple
> traversals of the networking stack? We can't simply stay with the TTL
> of the packet when it comes in, because it's tunnel destination might
> require a different TTL. The best thing would be to have a "tunnel
> TTL" value as part of skb->cb, except the cb gets overwritten when
> traversing the networking stack. The best thing I can think of is some
> other member of sk_buff, but I don't see any that look good for this.
> So perhaps it would be worthwhile to add this to struct sk_buff? David
> - are you interested in this if I submit a patch?
> Or, alternatively, does a fast solution for this already exist that I
> overlooked?

Have a look at __dev_queue_xmit and the per_cpu recursion limits
implemented there:

if (__this_cpu_read(xmit_recursion) >
goto recursion_alert;

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at