Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

From: Linus Torvalds
Date: Tue Apr 26 2011 - 13:13:43 EST


On Tue, Apr 26, 2011 at 9:38 AM, Bruno Prémont
<bonbons@xxxxxxxxxxxxxxxxx> wrote:
>
> Here it comes:
>
> rcu_kthread (when build processes are STOPped):
> [  836.050003] rcu_kthread     R running   7324     6      2 0x00000000
> [  836.050003]  dd473f28 00000046 5a000240 dd65207c dd407360 dd651d40 0000035c dd473ed8
> [  836.050003]  c10bf8a2 c14d63d8 dd65207c dd473f28 dd445040 dd445040 dd473eec c10be848
> [  836.050003]  dd651d40 dd407360 ddfdca00 dd473f14 c10bfde2 00000000 00000001 000007b6
> [  836.050003] Call Trace:
> [  836.050003]  [<c10bf8a2>] ? check_object+0x92/0x210
> [  836.050003]  [<c10be848>] ? init_object+0x38/0x70
> [  836.050003]  [<c10bfde2>] ? free_debug_processing+0x112/0x1f0
> [  836.050003]  [<c103d9fd>] ? lock_timer_base+0x2d/0x70
> [  836.050003]  [<c13c8ec7>] schedule_timeout+0x137/0x280

Hmm.

I'm adding Ingo and Peter to the cc, because this whole "rcu_kthread
is running, but never actually running" is starting to smell like a
scheduler issue.

Peter/Ingo: RCUTINY seems to be broken for Bruno. During any kind of
heavy workload, at some point it looks like rcu_kthread simply stops
making any progress. It's constantly in runnable state, but it doesn't
actually use any CPU time, and it's not processing the RCU callbacks,
so the RCU memory freeing isn't happening, and slabs just build up
until the machine dies.

And it really is RCUTINY, because the thing doesn't happen with the
regular tree-RCU.

This is without CONFIG_RCU_BOOST_PRIO, so we basically have

struct sched_param sp;

rcu_kthread_task = kthread_run(rcu_kthread, NULL, "rcu_kthread");
sp.sched_priority = RCU_BOOST_PRIO;
sched_setscheduler_nocheck(rcu_kthread_task, SCHED_FIFO, &sp);

where RCU_BOOST_PRIO is 1 for the non-boost case.

Is that so low that even the idle thread will take priority? It's a UP
config with PREEMPT_VOLUNTARY. So pretty much _all_ the stars are
aligned for odd scheduling behavior.

Other users of SCHED_FIFO tend to set the priority really high (eg
"MAX_RT_PRIO-1" is clearly the default one - softirq's, watchdog), but
"1" is not unheard of either (touchscreen/ucb1400_ts and
mmc/core/sdio_irq), and there are some other random choises out tere.

Any ideas?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/