Re: Linux 3.19-rc5

From: Davidlohr Bueso
Date: Wed Jan 21 2015 - 17:12:24 EST


On Wed, 2015-01-21 at 22:37 +0100, Bruno PrÃmont wrote:
> On Wed, 21 January 2015 Bruno PrÃmont wrote:
> > On Tue, 20 January 2015 Linus Torvalds wrote:
> > > On Tue, Jan 20, 2015 at 6:02 AM, Bruno PrÃmont wrote:
> > > >
> > > > No idea yet which rc is the offender (nor exact patch), but on my not
> > > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > > > converting my laptop into a heater.
> > > >
> > > > lspci for affected nodes:
> > > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > > >
> > > > Very basics I have, before I attempt any bisection:
> > >
> > > Hmm. I'm not seeing anything recent changing anything in this area, so
> > > I suspect that unless somebody else steps up and says "Ahh, that
> > > sounds like xyz", your bisection is the best option.
>
> Bisecting to the end did point me at (the warning traces produced in great
> quantities might not be the very same issue as the abusive CPU usage, but
> certainly look very related):
> [CCing people on CC for the patch]
>
> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Wed Sep 24 10:18:55 2014 +0200
>
> sched: Debug nested sleeps
>
> Validate we call might_sleep() with TASK_RUNNING, which catches places
> where we nest blocking primitives, eg. mutex usage in a wait loop.
>
> Since all blocking is arranged through task_struct::state, nesting
> this will cause the inner primitive to set TASK_RUNNING and the outer
> will thus not block.
>
> Another observed problem is calling a blocking function from
> schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
> then destroy the task state for the actual __schedule() call that
> comes after it.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Cc: tglx@xxxxxxxxxxxxx
> Cc: ilya.dryomov@xxxxxxxxxxx
> Cc: umgwanakikbuti@xxxxxxxxx
> Cc: oleg@xxxxxxxxxx
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Link: http://lkml.kernel.org/r/20140924082242.591637616@xxxxxxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>
> Which does produce the following trace (hand-copied most important parts of it):
> Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
> do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1436390>] pccardd+0xa0/0x3e0
> ...
> Call trace:
> ...
> __might_sleep+0x143/0x170
> ? pccardd+0xa0/0x3e0
> ? pccardd+0xa0/0x3e0
> mutex_lock+0x17/0x2a
> pccardd+0xe9/0x3e0
> ? pcmcia_socket_uevent+0x30/0x30
>
> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
> Peter's patch wants to warn about.

Yeah setting current to interruptable so early in the game is bogus. It
should be set after unlocking the skt_mutex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/