Re: v2.6.31-rc6: BUG: unable to handle kernel NULL pointerdereference at 0000000000000008

From: Frederic Weisbecker
Date: Tue Aug 25 2009 - 11:05:24 EST


On Mon, Aug 24, 2009 at 09:30:16PM -0700, Linus Torvalds wrote:
>
>
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
> >
> > Anyway, I'll happily be shown wrong. I think the (second) patch I sent out
> > is an acceptable hack in the presense of the current locking, but as I
> > said, I'm not exactly happy about it, because I do think the locking is
> > broken.
>
> Btw, another solution to all this would be to just not have that
> ldisc_mutex deadlock due to do_tty_hangup -> tty_ldisc_hangup at all.
>
> The actual _flushing_ doesn't need the mutex - it's just that both
> flushing and hangup is done with workqueues.



Yeah, it would be sad, but having the flushing done in a dedicated workqueue
would solve the need of relaxing the lock, because we would only wait
for the pending flush works, not the hangup works.

But it's sad to create a thread only for that.


> If we can avoid the deadlock by not having the (artificial) workqueue
> dependency, it would allow everybody to just hold on to the mutex over the
> whole sequence - and would obviate the need for that hacky
> TTY_LDISC_CHANGING bit thing in tty_set_ldisc.
>
> In other words, the whole problem really comes in from the fact that
> do_tty_hangup() is called from "hangup_work", and the workqueues can get
> hung to the point where you can't then do the (totally _unrelated_) queue
> flushing.
>
> Because flush_to_ldisc() itself - which is what we want to do - doesn't
> need that mutex or the workqueue at all. It could run from any context,
> afaik.
>
> So if we were to turn it into just a timer (rather than a "delayed work"),
> then we'd not need to do that "flush_scheduled_work()" thing at all, and
> we wouldn't have that interaction with do_tty_hangup(). At which point we
> could again hold on to locks, because we wouldn't need to worry about the
> workqueues getting stuck on the mutex (that isn't even needed for the
> actual flushing part that we want to do!).



Yeah, a simple timer would be better than a dedicated workqueue in that
we don't need a whole thread for such small job.


>
> So don't get me wrong - there are _multiple_ ways to solve this. But they
> are all pretty major surgery, changing "big" semantics. We could fix the
> locking, we could change how we flush, we could do all of those things.
> And I'd love to. But I think the almost-oneliner is the safest approach
> right now. It's certainly not perfect, but it's fairly minimal impact.
>
> Linus


Yep.
I hope the progressive work Jens Axboe is doing on workqueues will drop
their serialized nature which leads to such perpetual deadlocks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/