Re: [Xen-devel] Regression in xen-netfront on v3.6 (git commitc48a11c7ad2623b99bbd6859b0b4234e7f11176f, netvm: propagate page->pfmemallocto skb)

From: Mel Gorman
Date: Sat Aug 04 2012 - 09:31:51 EST


On Sat, Aug 04, 2012 at 07:03:55AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 03, 2012 at 08:04:14AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Aug 01, 2012 at 03:02:27PM -0400, Konrad Rzeszutek Wilk wrote:
> > > So I hadn't done a git bisection yet. But if I choose git commit:
> > > 4b24ff71108164e047cf2c95990b77651163e315
> > > Merge tag 'for-v3.6' of git://git.infradead.org/battery-2.6
> > >
> > > Pull battery updates from Anton Vorontsov:
> > >
> > >
> > > everything works nicely. Anything past that, so these merges:
> > >
> > > konrad@phenom:~/ssd/linux$ git log --oneline --merges 4b24ff71108164e047cf2c95990b77651163e315..linus/master
> > > 2d53492 Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
> > ===> ac694db Merge branch 'akpm' (Andrew's patch-bomb)
> >
> > Somewhere in there is the culprit. Hadn't done yet the full bisection
> > (was just checking out in each merge to see when it stopped working)
>
> Mel, your:
> commit c48a11c7ad2623b99bbd6859b0b4234e7f11176f
> Author: Mel Gorman <mgorman@xxxxxxx>
> Date: Tue Jul 31 16:44:23 2012 -0700
>
> netvm: propagate page->pfmemalloc to skb
>
> is the culprit per git bisect. Any ideas - do the drivers need to do
> some extra processing? Here is the git bisect log
>

The problem appears to be at drivers/net/xen-netfront.c#973 where it
calls __skb_fill_page_desc(skb, 0, NULL, 0, 0) . The driver does not
have to do extra processing as such but I did not expect NULL to be
passed in like this. Can you check if this fixes the bug please?

---8<---
netvm: check for page == NULL when propogating the skb->pfmemalloc flag

Commit [c48a11c7: netvm: propagate page->pfmemalloc to skb] is responsible
for the following bug triggered by a xen network driver

[ 1.908592] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 1.908643] IP: [<ffffffffa0037750>] xennet_poll+0x980/0xec0 [xen_netfront]
[ 1.908703] PGD ea1df067 PUD e8ada067 PMD 0
[ 1.908774] Oops: 0000 [#1] SMP
[ 1.908797] Modules linked in: fbcon tileblit font radeon bitblit softcursor ttm drm_kms_helper crc32c_intel xen_blkfront xen_netfront xen_fbfront fb_sys_fops sysimgblt sysfillrect syscopyarea +xen_kbdfront xenfs xen_privcmd
[ 1.908938] CPU 0
[ 1.908950] Pid: 2165, comm: ip Not tainted 3.5.0upstream-08854-g444fa66 #1
[ 1.908983] RIP: e030:[<ffffffffa0037750>] [<ffffffffa0037750>] xennet_poll+0x980/0xec0 [xen_netfront]
[ 1.909029] RSP: e02b:ffff8800ffc03db8 EFLAGS: 00010282
[ 1.909055] RAX: ffff8800ea010140 RBX: ffff8800f00e86c0 RCX: 000000000000009a
[ 1.909055] RDX: 0000000000000040 RSI: 000000000000005a RDI: ffff8800fa7dee80
[ 1.909055] RBP: ffff8800ffc03ee8 R08: ffff8800f00e86d8 R09: ffff8800ea010000
[ 1.909055] R10: dead000000200200 R11: dead000000100100 R12: ffff8800fa7dee80
[ 1.909055] R13: 000000000000005a R14: ffff8800fa7dee80 R15: 0000000000000200
[ 1.909055] FS: 00007fbafc188700(0000) GS:ffff8800ffc00000(0000) knlGS:0000000000000000
[ 1.909055] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1.909055] CR2: 0000000000000010 CR3: 00000000ea108000 CR4: 0000000000002660
[ 1.909055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.909055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1.909055] Process ip (pid: 2165, threadinfo ffff8800ea0f2000, task ffff8800fa783040)
[ 1.909055] Stack:
[ 1.909055] ffff8800e27e5040 ffff8800ffc03e88 ffff8800ffc03e68 ffff8800ffc03e48
[ 1.909055] 7fffffffffffffff ffff8800ffc03e00 ffff8800e27e5040 ffff8800f00e86d8
[ 1.909055] ffff8800ffc03eb0 00000040ffffffff ffff8800f00e8000 00000000ffc03e30
[ 1.909055] Call Trace:
[ 1.909055] <IRQ>
[ 1.909055] [<ffffffff81066028>] ? pvclock_clocksource_read+0x58/0xd0
[ 1.909055] [<ffffffff81486352>] net_rx_action+0x112/0x240
[ 1.909055] [<ffffffff8107f319>] __do_softirq+0xb9/0x190
[ 1.909055] [<ffffffff815d8d7c>] call_softirq+0x1c/0x30

The problem is that the xenfront driver is passing a NULL page to
__skb_fill_page_desc() which was unexpected. This patch checks that
there is a page before dereferencing.

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
---
include/linux/skbuff.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7632c87..8857669 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1256,7 +1256,7 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
* do not lose pfmemalloc information as the pages would not be
* allocated using __GFP_MEMALLOC.
*/
- if (page->pfmemalloc && !page->mapping)
+ if (page && page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
frag->page.p = page;
frag->page_offset = off;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/