Re: Layer 2 over IPv6 GRE and path MTU discovery

From: Erik Auerswald
Date: Sat Oct 15 2016 - 11:21:11 EST


Hi Mike,

On Fri, Oct 14, 2016 at 01:59:49PM -0700, Mike Walker wrote:
> When using a layer 2 GREv6 tunnel (ip6gretap), I am using a Linux
> bridge to push Ethernet frames from an Ethernet port to the GREv6
> device.
>
> Here is an example of the topology:
>
> PC -> eth0 -> grebridge -> gre6dev -> (internet) -> GRE endpoint -> Remote host
>
> In this case, the PC connected to the Ethernet port is using IPv6 to
> communicate with the remote host, so the source and destination IP of
> the traffic being sent by the PC are both IPv6 addresses. So we have
> an IPv6 header, Ethernet header, then GRE header once the
> encapsulation is done.
>
> Sometimes these packets are too large for the GRE tunnel's MTU. When
> this happens, the router's kernel wants to send an ICMP "packet too
> big" error message back to the PC.

The proper way to handle this is to adjust the MTU of both the "PC"
and the "Remote host" to reflect the properties of the GRE tunnel.

> However, the router has no routing information for the PC. The path
> from the PC to the remote host is all supposed to be layer 2. The
> router is not configured to route traffic to the PC or the remote
> host, only to bridge the layer 2 frames.

Therefore the end points of this virtual Ethernet link need to know the
MTU for this link.

Alternatively, you can try to fragment packets inside the tunnel to fake
the 1500B MTU commonly assumed. That is supported by commercial networking
gear, I have not looked for a possible GNU/Linux implementation, yet.

> What happens then is Linux tries to send an ICMP error, it can't find
> the route, or else it sends it to its default route, none of which do
> any good.
>
> If the PC doesn't get this ICMP error, it will not know why the
> packets were dropped, or it won't even know they were dropped. It's
> an ICMP blackhole scenario right?
>
> So, one solution I tried was hacking the kernel so that if it's trying
> to send this ICMP "packet too big" error to a host, and we know it's a
> layer 2 GRE tunnel, instead of the normal logic, force the ICMP error
> message to be sent back out via the network interface the offending
> packet was received on.
>
> This mostly worked, the PC recieves the ICMP error and adjusts its
> path MTU, so in the future it will know to fragment the packet if it's
> too big.
>
> Problem is, I don't know what source IP and mac address I should be
> using when I send back this ICMP error to the PC. Normally this
> network path doesn't have any layer 3 address, and even the mac
> address normally is transparent / unknown to the PC. For my prototype
> I simply set the source IP of the ICMP error to whatever was the
> destination IP of the packet that was too big. I let the kernel use
> the mac address of either the bridge or eth0.
>
> I couldn't seem to find any RFC that says how this should be handled.
> Any ideas?

If you do want to use this hack, I'd suggest to use some MAC and
IPv6 address owned by the tunnel endpoint. You could use a link local
address for this, just to ensure you do not create frames that clash
with legitimate IPv6 / MAC combinations.

Best regards,
Erik
--
A distributed system is one in which the failure of a computer you didn't
even know existed can render your own computer unusable.
-- Leslie Lamport