Re: [Bridge] [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls

From: Fredrik MarkstrÃm
Date: Fri May 12 2017 - 08:49:00 EST


On Fri, May 12, 2017 at 10:05 AM, Teco Boot <teco@xxxxxxxxxx> wrote:
> IP MTU and L2 MTU are different animals.
>
> IMHO IP MTU is for fragmentation at sender of a link. There is no need dropping IP packets at receiver with size > configured IP MTU. IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.
>
First, thanks for putting words on the different MTU:s (L2 vs IP MTU)

I agree and don't understand why we are dropping packets due to
receiver IP MTU at all and would not mind removing that test
altogether, at least for veth.

/Fredrik


> For this patch: if veth has some notion on L2 MTU (e.g. buffer size limits), there has to be checks for it. I don't know why configuring MRU helps, more config, more mistakes. If there is no need for dropping the packet: don't.
>
> Teco
>
>
>> Op 11 mei 2017, om 21:10 heeft Fredrik MarkstrÃm <fredrik.markstrom@xxxxxxxxx> het volgende geschreven:
>>
>> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
>> <stephen@xxxxxxxxxxxxxxxxxx> wrote:
>>> On Thu, 11 May 2017 15:46:27 +0200
>>> Fredrik Markstrom <fredrik.markstrom@xxxxxxxxx> wrote:
>>>
>>>> From: Fredrik MarkstrÃm <fredrik.markstrom@xxxxxxxxx>
>>>>
>>>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>>>> the receiving interface. This is not consistent with most of the hardware
>>>> ethernet drivers that happily receives packets larger then MTU.
>>>
>>> Wrong.
>>
>> What is "Wrong" ? I was initially skeptical to implement this patch,
>> since it feels odd to have different MTU:s set on the two sides of a
>> link. After consulting some IP people and the RFC:s I kind of changed
>> my mind and thought I'd give it a shot. In the RFCs I couldn't find
>> anything that defined when and when not a received packet should be
>> dropped.
>>
>>>
>>> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>>> The actual limit is a function of the hardware. Some hardware can only limit by
>>> power of 2; some can only limit frames larger than 1500; some have no limiting at all.
>>
>> Agreed. The purpose of these patches is to be able to configure an
>> veth interface to mimic these different behaviors. Non of the Ethernet
>> interfaces I have access to drops packets due to them being larger
>> then the configured MTU like veth does.
>>
>> Being able to mimic real Ethernet hardware is useful when
>> consolidating hardware using containers/namespaces.
>>
>> In a reply to a comment from David Miller in my previous version of
>> the patch I attached the example below to demonstrate the case in
>> detail.
>>
>> This works with all ethernet hardware setups I have access to:
>>
>> ---- 8< ------
>> # Host A eth2 and Host B eth0 is on the same network.
>>
>> # On HOST A
>> % ip address add 1.2.3.4/24 dev eth2
>> % ip link set eth2 mtu 300 up
>>
>> % # HOST B
>> % ip address add 1.2.3.5/24 dev eth0
>> % ip link set eth0 mtu 1000 up
>> % ping -c 1 -W 1 -s 400 1.2.3.4
>> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
>> 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms
>>
>> --- 1.2.3.4 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
>> ---- 8< ------
>>
>>
>> But it doesn't work with veth:
>>
>> ---- 8< ------
>> # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
>> network namespace.
>> % # NS A
>> % ip address add 1.2.3.4/24 dev veth0
>> % ip link set veth0 mtu 300 up
>>
>> % # NS B
>> % ip address add 1.2.3.5/24 dev veth1
>> % ip link set veth1 mtu 1000 up
>> % ping -c 1 -W 1 -s 400 1.2.3.4
>> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
>>
>> --- 1.2.3.4 ping statistics ---
>> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>> ---- 8< ------
>>
>> --
>> /Fredrik
>



--
/Fredrik