Re: BUG in xennet_make_frags with paged skb data
From: David Vrabel
Date: Mon Nov 10 2014 - 09:41:53 EST
On 10/11/14 14:35, Seth Forshee wrote:
> On Fri, Nov 07, 2014 at 10:44:15AM +0000, David Vrabel wrote:
>> On 06/11/14 21:49, Seth Forshee wrote:
>>> We've had several reports of hitting the following BUG_ON in
>>> xennet_make_frags with 3.2 and 3.13 kernels (I'm currently awaiting
>>> results of testing with 3.17):
>>>
>>> /* Grant backend access to each skb fragment page. */
>>> for (i = 0; i < frags; i++) {
>>> skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>>> struct page *page = skb_frag_page(frag);
>>>
>>> len = skb_frag_size(frag);
>>> offset = frag->page_offset;
>>>
>>> /* Data must not cross a page boundary. */
>>> BUG_ON(len + offset > PAGE_SIZE<<compound_order(page));
>>>
>>> When this happens the page in question is a "middle" page in a compound
>>> page (i.e. it's a tail page but not the last tail page), and the data is
>>> fully contained within the compound page. The data does however cross
>>> the hardware page boundary, and since compound_order evaluates to 0 for
>>> tail pages the check fails.
>>>
>>> In going over this I've been unable to determine whether the BUG_ON in
>>> xennet_make_frags is incorrect or the paged skb data is wrong. I can't
>>> find that it's documented anywhere, and the networking code itself is a
>>> bit ambiguous when it comes to compound pages. On the one hand
>>> __skb_fill_page_desc specifically handles adding tail pages as paged
>>> data, but on the other hand skb_copy_bits kmaps frag->page.p which could
>>> fail with data that extends into another page.
>>
>> netfront will safely handle this case so you can remove this BUG_ON()
>> (and the one later on). But it would be better to find out were these
>> funny-looking skbs are coming from and (if necessary) fixing the bug there.
>
> There still seems to be disagreement about whether the "funny" skb is
> valid though - you imply it isn't, but Eric says it is. I've been trying
> to track down where these skbs originate, and so far I've determined
> that they come from a socket spliced to a pipe spliced to a socket. It
> looks like the particular page/offset/len tuple originates at least as
> far back as the first socket, as the tuple is simply copied from an skb
> into the pipe and from the pipe into the final skb.
Apologies for the lack of clarity. I meant either: a) fix the producer
if these skbs are invalid; or b) remove the BUG_ON()s. Since Eric says
these are actually valid skbs, please do option (b).
i.e., remove both BUG_ON()s.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/