Re: [PATCH 2/2] usb: gadget: ncm: Add support to update wMaxSegmentSize via configfs
From: Maciej Żenczykowski
Date: Sun Oct 15 2023 - 21:19:43 EST
On Sat, Oct 14, 2023 at 1:24 AM Krishna Kurapati PSSNV
<quic_kriskura@xxxxxxxxxxx> wrote:
>
>
>
> On 10/14/2023 12:32 PM, Krishna Kurapati PSSNV wrote:
> >
> >
> > On 10/14/2023 4:05 AM, Maciej Żenczykowski wrote:
> >>>>> The intent of posting the diff was two fold:
> >>>>>
> >>>>> 1. The question Greg asked regarding why the max segment size was
> >>>>> limited to 15014 was valid. When I thought about it, I actually wanted
> >>>>> to limit the max MTU to 15000, so the max segment size automatically
> >>>>> needs to be limited to 15014.
> >>>>
> >>>> Note that this is a *very* abstract value.
> >>>> I get you want L3 MTU of 10 * 1500, but this value is not actually
> >>>> meaningful.
> >>>>
> >>>> IPv4/IPv6 fragmentation and IPv4/IPv6 TCP segmentation
> >>>> do not result in a trivial multiplication of the standard 1500 byte
> >>>> ethernet L3 MTU.
> >>>> Indeed aggregating 2 1500 L3 mtu frames results in *different* sized
> >>>> frames depending on which type of aggregation you do.
> >>>> (and for tcp it even depends on the number and size of tcp options,
> >>>> though it is often assumed that those take up 12 bytes, since that's
> >>>> the
> >>>> normal for Linux-to-Linux tcp connections)
> >>>>
> >>>> For example if you aggregate N standard Linux ipv6/tcp L3 1500 mtu
> >>>> frames,
> >>>> this means you have
> >>>> N frames: ethernet (14) + ipv6 (40) + tcp (20) + tcp options (12) +
> >>>> payload (1500-12-20-40=1500-72=1428)
> >>>> post aggregation:
> >>>> 1 frame: ethernet (14) + ipv6 (40) + tcp (20) + tcp options (12) +
> >>>> payload (N*1428)
> >>>>
> >>>> so N * 1500 == N * (72 + 1428) --> 1 * (72 + N * 1428)
> >>>>
> >>>> That value of 72 is instead 52 for 'standard Linux ipv4/tcp),
> >>>> it's 40/60 if there's no tcp options (which I think happens when
> >>>> talking to windows)
> >>>> it's different still with ipv4 fragmentation... and again different
> >>>> with ipv6 fragmentation...
> >>>> etc.
> >>>>
> >>>> ie. 15000 L3 mtu is exactly as meaningless as 14000 L3 mtu.
> >>>> Either way you don't get full frames.
> >>>>
> >>>> As such I'd recommend going with whatever is the largest mtu that can
> >>>> be meaningfully made to fit in 16K with all the NCM header overhead.
> >>>> That's likely closer to 15500-16000 (though I have *not* checked).
> >>>>
> >>>>> But my commit text didn't mention this
> >>>>> properly which was a mistake on my behalf. But when I looked at the
> >>>>> code, limiting the max segment size 15014 would force the practical
> >>>>> max_mtu to not cross 15000 although theoretical max_mtu was set to:
> >>>>> (GETHER_MAX_MTU_SIZE - 15412) during registration of net device.
> >>>>>
> >>>>> So my assumption of limiting it to 15000 was wrong. It must be limited
> >>>>> to 15412 as mentioned in u_ether.c This inturn means we must limit
> >>>>> max_segment_size to:
> >>>>> GETHER_MAX_ETH_FRAME_LEN (GETHER_MAX_MTU_SIZE + ETH_HLEN)
> >>>>> as mentioned in u_ether.c.
> >>>>>
> >>>>> I wanted to confirm that setting MAX_DATAGRAM_SIZE to
> >>>>> GETHER_MAX_ETH_FRAME_LEN was correct.
> >>>>>
> >>>>> 2. I am not actually able to test with MTU beyond 15000. When my host
> >>>>> device is a linux machine, the cdc_ncm.c limits max_segment_size to:
> >>>>> CDC_NCM_MAX_DATAGRAM_SIZE 8192 /* bytes */
> >>>>
> >>>> In practice you get 50% of the benefits of infinitely large mtu by
> >>>> going from 1500 to ~2980.
> >>>> you get 75% of the benefits by going to ~6K
> >>>> you get 87.5% of the benefits by going to ~12K
> >>>> the benefits of going even higher are smaller and smaller...
> >>>> > If the host side is limited to 8192, maybe we should match that
> >>>> here too?
> >>>
> >>> Hi Maciej,
> >>>
> >>> Thanks for the detailed explanation. I agree with you on setting
> >>> device side also to 8192 instead of what max_mtu is present in u_ether
> >>> or practical max segment size possible.
> >>>
> >>>>
> >>>> But the host side limitation of 8192 doesn't seem particularly sane
> >>>> either...
> >>>> Maybe we should relax that instead?
> >>>>
> >>> I really didn't understand why it was set to 8192 in first place.
> >>>
> >>>> (especially since for things like tcp zero copy you want an mtu which
> >>>> is slighly more then N * 4096,
> >>>> ie. around 4.5KB, 8.5KB, 12.5KB or something like that)
> >>>>
> >>>
> >>> I am not sure about host mode completely. If we want to increase though,
> >>> just increasing the MAX_DATAGRAM_SIZE to some bigger value help ? (I
> >>> don't know the entire code of cdc_ncm, so I might be wrong).
> >>>
> >>> Regards,
> >>> Krishna,
> >>
> >> Hmm, I'm not sure. I know I've experimented with high mtu ncm in the
> >> past
> >> (around 2.5 years ago). I got it working between my Linux desktop (host)
> >> and a Pixel 6 (device/gadget) with absolutely no problems.
> >>
> >> I'm pretty sure I didn't change my desktop kernel, so I was probably
> >> limited to 8192 there
> >> (and I do more or less remember that).
> >> From what I vaguely remember, it wasn't difficult (at all) to hit
> >> upwards of 7gbps for iperf tests.
> >> I don't remember how close to the theoretical USB 10gbps maximum of
> >> 9.7gbps I could get...
> >> [this was never the real bottleneck / issue, so I didn't ever dig
> >> particularly deep]
> >>
> >> I'm pretty sure my gadget side changes were non-configurable...
> >> Probably just bumped one or two constants...
> >>
> > Could you share what parameters you changed to get this high value of
> > iperf throughput.
Eh, I really don't remember, but it wasn't anything earth shattering.
>From what I recall it was just a matter of bumping mtu, and tweaking
irq pinning to stronger cores.
Indeed I'm not even certain that the mtu was required to be over 5gbps.
Though I may be confusing some things, as at least some of the testing was done
with the kernel's built in packet generator.
> >
> >> I do *very* *vaguely* recall there being some funkiness though, where
> >> 8192 was
> >> *less* efficient than some slightly smaller value.
> >>
> >> If I recall correctly the issue is that 8192 + ethernet overhead + NCM
> >> overhead only fits *once* into 16384, which leaves a lot of space
> >> wasted.
> >> While ~7.5 kb + overhead fits twice and is thus a fair bit better.
> > Right, same goes for using 5K vs 5.5K MTU. If MTU is 5K, 3 packets can
> > conveniently fit into an NTB but if its 5.5, at max only two (5.5k)
> > packets can fit in (essentially filling ~11k of the 16384 bytes and
> > wasting the rest)
>
> Formatting gone wrong. So pasting the first paragraph again here:
>
> "Right, same goes for using 5K vs 5.5K MTU. If MTU is 5K, 3 packets can
> conveniently fit into an NTB but if its 5.5, at max only two (5.5k)
> packets can fit in (essentially filling ~11k of the 16384 bytes and
> wasting the rest)"
>
> >
> > And whether its Ipv4/Ipv6 like you mentioned on [1], the MTU is what NCM
> > layer receives and we append the Ethernet header and add NCM headers and
> > send it out after aggregation. Why can't we set the MAX_DATAGRAM_SIZE to
> > ~8050 or ~8100 ? The reason I say this value is, obviously setting it to
> > 8192 would not efficiently use the NTB buffer. We need to fill as much
> > space in buffer as possible and assuming that each packet received on
> > ncm layer is of MTU size set (not less that that), we can assume that
> > even if only 2 packets are aggregated (minimum aggregation possible), we
> > would be filling (2 * (8050 + ETH_HLEN) + (room for NCM headers)) would
> > almost be close to 16384 ep max packet size. I already check 8050 MTU
> > and it works. We can add a comment in code detailing the above
> > explanation and why we chose to use 8050 or 8100 as MAX_DATAGRAM_SIZE.
> >
> > Hope my reasoning of why we can chose 8.1K or 8.05K makes sense. Let me
> > know your thoughts on this.
Maybe just use an L3 mtu of 8000 then? That's a nice round number...
But I'm also fine with 8050 or 8100.. though 8100 seems 'rounder'.
I'm not sure what the actual overhead is... I guess we control the
overhead in one direction, but not in the other, and there could be
some slop, so we need to be a little generous?
> >
>
> [1]:
> https://lore.kernel.org/all/CANP3RGd4G4dkMOyg6wSX29NYP2mp=LhMhmZpoG=rgoCz=bh1=w@xxxxxxxxxxxxxx/
>
> > Regards,
> > Krishna,
> >Maciej Żenczykowski, Kernel Networking Developer @ Google