Re: [GIT PULL v2] Early SLAB fixes for 2.6.31
From: Pekka Enberg
Date: Mon Jun 15 2009 - 04:33:16 EST
Hi Nick,
On Mon, 2009-06-15 at 10:26 +0200, Nick Piggin wrote:
> On Mon, Jun 15, 2009 at 10:18:31AM +0200, Heiko Carstens wrote:
> > On Fri, Jun 12, 2009 at 07:16:30PM +0300, Pekka J Enberg wrote:
> > > Hi Linus,
> > >
> > > I dropped the GFP_WAIT conversion patch and added the gfp masking patch
> > > you liked. I tested this on x86-64 with both SLAB and SLUB.
> >
> > Hi Pekka,
> >
> > I tried to convert some of the early allocations on s390. Some callsites
> > however need to have the GFP_DMA flag, since we need to allocate memory below
> > 2GB. Passing GFP_DMA causes this crash:
> >
> > <1>Unable to handle kernel pointer dereference at virtual kernel address fffffffffffff000
> > <4>Oops: 0038 [#1] PREEMPT SMP
> > <4>Modules linked in:
> > <4>CPU: 0 Not tainted 2.6.30-03984-g45e3e19-dirty #233
> > <4>Process swapper (pid: 0, task: 00000000006a2ef0, ksp: 0000000000718000)
> > <4>Krnl PSW : 0700100180000000 00000000000808ee (queue_work_on+0x8e/0xe0)
> > <4> R:0 T:1 IO:1 EX:1 Key:0 M:0 W:0 P:0 AS:0 CC:1 PM:0 EA:3
> > <4>Krnl GPRS: 0000000000000000 ffffffffffffffff 0000000000000000 00000000006b8b88
> > <4> 00000000006b8b88 0000000000000001 0000000000000008 0000000000000200
> > <4> 000000003fe28000 0000000000008001 00000000011da730 0000000000717ca0
> > <4> 00000000006b8b88 0000000000488650 0000000000717cd0 0000000000717ca0
> > <4>Krnl Code: 00000000000808de: e310d0000082 xg %r1,0(%r13)
> > <4> 00000000000808e4: eb220003000d sllg %r2,%r2,3
> > <4> 00000000000808ea: b9040034 lgr %r3,%r4
> > <4> >00000000000808ee: e32210000004 lg %r2,0(%r2,%r1)
> > <4> 00000000000808f4: c0e5ffffff28 brasl %r14,80744
> > <4> 00000000000808fa: a7280001 lhi %r2,1
> > <4> 00000000000808fe: e340b0b80004 lg %r4,184(%r11)
> > <4> 0000000000080904: b9140022 lgfr %r2,%r2
> > <4>Call Trace:
> > <4>([<000000003fe28000>] 0x3fe28000)
> > <4> [<0000000000080e96>] queue_work+0x62/0xa4
> > <4> [<0000000000080f26>] schedule_work+0x4e/0x60
> > <4> [<0000000000132f7e>] dma_kmalloc_cache+0x1ca/0x1d0
> > <4> [<00000000001330ae>] get_slab+0x12a/0x130
> > <4> [<00000000001337b6>] __kmalloc+0x5e/0x364
> > <4> [<0000000000739132>] con3215_init+0x1c2/0x2e4
> > <4> [<00000000007333ea>] console_init+0x42/0x5c
> > <4> [<0000000000718e50>] start_kernel+0x53c/0x6b8
> > <4> [<0000000000012020>] _ehead+0x20/0x80
> >
> > I didn't look any deeper into this, but looks to me like doing something like
> > schedule_work() this early isn't ok.
> >
> > This is the conversion that leads to the crash:
> >
> > - alloc_bootmem_low(sizeof(struct raw3215_info));
> > + kzalloc(sizeof(struct raw3215_info), GFP_NOWAIT | GFP_DMA);
> >
> > Might be that I missed something. Maybe some special flag?
>
> No, just a bug in the conversion.
>
> If you predicate the schedule_work call on slab_state == SYSFS, then
> it should work (when sysfs comes up later in init, previously added
> slabs will be registered with sysfs).
>
> Oh, and you'd need to also not pass __SYSFS_ADD_DEFERRED into
> kmem_cache_create in that case too.
I am not sure I follow you here. We are setting up slab so early that we
absolutely _must_ defer sysfs setup. But we're also setting up slab much
earlier than workqueues, so we shouldn't really do schedule_work() at
that point. Furthermore, early boot cache sysfs setup is explicitly
handled in slab_sysfs_init() so I think we need something like the patch
below?
Pekka
diff --git a/mm/slub.c b/mm/slub.c
index 30354bf..4c12138 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2642,7 +2642,13 @@ static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags)
list_add(&s->list, &slab_caches);
kmalloc_caches_dma[index] = s;
- schedule_work(&sysfs_add_work);
+ /*
+ * The slab allocator is set up much earlier than workqueues. As early
+ * boot caches are handle by slab_sysfs_init(), avoid calling
+ * schedule_work() until keventd is up.
+ */
+ if (keventd_up())
+ schedule_work(&sysfs_add_work);
unlock_out:
up_write(&slub_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/