RE: skbuff truesize incorrect.

From: David Laight
Date: Tue May 27 2014 - 11:24:22 EST


From: David Miller
> From: David Laight <David.Laight@xxxxxxxxxx>
> Date: Fri, 23 May 2014 08:52:13 +0000
>
> > The hardware will put multiple ethernet frames into a single USB bulk data
> > message. To handle this the driver generates a URB that is (hopefully) long
> > enough for the longest USB message (typically 32k is assumed to be enough).
> > The URB that usb_net generated have the data in a linear skb - which then
> > has a large 'truesize'.
> >
> > Since USB bulk data are terminated by a short fragment there is actually
> > no need for the URB be long enough for the full message. Provided the
> > URB are multiples of the USB message size (1k for USB 3) the message
> > can be received into multiple URB - the driver just has to be willing
> > to merge URB buffers (as well as split them) when generating the ethernet
> > frames.
>
> I think we could take a less invasive approach.
>
> Use whatever order page is needed for that 32K chunk for the URB,
> but split up the compound page so that the individual pages can be
> accounted for separately.

Using 4k 'pages' would be fine (or even 1k).
All the USB interfaces support SG provided the fragments are all
aligned.
Recycling the unused 4k pages would probably be better than using
separate URB for each fragment.

Note that the xhci driver has to split buffers that cross 64k
boundaries into separate ring entries - and these must not cross
the end of the ring (thank you hardware engineers...).
So not receiving into the 'linear' part of an skb helps.

Although I suspect that 32k is excessive - provided the driver can
merge URB (as well as pages) to generate ethernet frames.

> Hook up the pages into the SKB frag array, and only account PAGE_SIZE
> into the SKB truesize for the individual pages actually used from
> the superpage.
>
> This will largely retain the URB allocation and processing scheme,
> yet at the same time dramatically decrease the truesize overrage
> factor.

True, you still need to sort out how to handle multiple ethernet
frames in the same 4k page, and arbitrary page boundaries within a frame.

I suspect that using 1k or 2k pages and using 'copybreak' to never pass
up shared pages would give the best overall performance.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/