Re: [git pull] x86 fixes

From: Andrew Morton
Date: Mon Jan 26 2009 - 15:15:36 EST


On Mon, 26 Jan 2009 20:59:10 +0100
Ingo Molnar <mingo@xxxxxxx> wrote:

> * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Mon, 26 Jan 2009 20:20:16 +0100
> > Ingo Molnar <mingo@xxxxxxx> wrote:
> >
> > >
> > > * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > On Mon, 26 Jan 2009 18:17:23 +0100
> > > > Ingo Molnar <mingo@xxxxxxx> wrote:
> > > >
> > > > > Rusty Russell (2):
> > > > > ...
> > > > > work_on_cpu: Use our own workqueue.
> > > >
> > > > wtf?
> > >
> > > an x86 fix depends on it:
> > >
> > > 7285908: cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> > >
> >
> > Right. And these are currently under active (albeit rather slow)
> > discussion.
> >
> > The changelogs suck, nobody can be assed actually telling us what the
> > bug is and the patches just casually toss yet another gaggle of kernel
> > threads into there.
>
> One problem is that for example do_dbs_timer() [used both in the
> cpufreq_ondemand and cpufreq_conservative cpufreq drivers] gets queued
> into the generic kevent workqueue via schedule_work(). But work_on_cpu()
> needs to serialize on the worklet - i.e. it needs to do a flush_work() -
> and does this with the cpufreq lock held.

We've discovered many times that doing a workqueue flush with a lock
held is a bad idea. If any callback takes that lock, or takes some
other lock which someone else takes while holding the original lock,
etc, we deadlock.

> So we have a 'worklet inversion' bug here - and this got reported as a
> hard to debug boot hang on some systems.
> The root cause is that kevents is not that do_dbs_timer() uses
> schedule_work() - the root cause is that kevents workqueue is a too
> generic workqueue that is the union of all casual workqueue users in the
> kernel. That is fine (and it is its purpose) but it should not be used for
> core kernel facilities such as work_on_cpu() - precisely because doing so
> would limit that facility's genericity.

Yes, that's a point.

> This bug was always there but dormant until work_on_cpu() was used from a
> deep enough codepath.
>
> So the solution here is to isolate work_on_cpu()s mechanisms from the
> 'misc' workqueue that schedule_work() deals with - this is what Rusty's
> patch does.

But it's still deadlockable, isn't it? If you do a work_on_cpu() while
holding a lock, and the callback function takes that lock, deadlock?
If so, things didn't get much better?

> Your observation about there being too many workqueue threads is correct
> but this commit is IMO a valid use of the workqueue facility. This
> workqueue it only gets created on CONFIG_SMP so there cost is about ~10K
> RAM per CPU.

mm.. This is why I'd like to expend a little more effort trying to
avoid having to make this change.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/