Re: [git pull] x86 fixes

From: Ingo Molnar
Date: Mon Jan 26 2009 - 14:59:32 EST


* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Mon, 26 Jan 2009 20:20:16 +0100
> Ingo Molnar <mingo@xxxxxxx> wrote:
>
> >
> > * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > On Mon, 26 Jan 2009 18:17:23 +0100
> > > Ingo Molnar <mingo@xxxxxxx> wrote:
> > >
> > > > Rusty Russell (2):
> > > > ...
> > > > work_on_cpu: Use our own workqueue.
> > >
> > > wtf?
> >
> > an x86 fix depends on it:
> >
> > 7285908: cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> >
>
> Right. And these are currently under active (albeit rather slow)
> discussion.
>
> The changelogs suck, nobody can be assed actually telling us what the
> bug is and the patches just casually toss yet another gaggle of kernel
> threads into there.

One problem is that for example do_dbs_timer() [used both in the
cpufreq_ondemand and cpufreq_conservative cpufreq drivers] gets queued
into the generic kevent workqueue via schedule_work(). But work_on_cpu()
needs to serialize on the worklet - i.e. it needs to do a flush_work() -
and does this with the cpufreq lock held.

So we have a 'worklet inversion' bug here - and this got reported as a
hard to debug boot hang on some systems.
The root cause is that kevents is not that do_dbs_timer() uses
schedule_work() - the root cause is that kevents workqueue is a too
generic workqueue that is the union of all casual workqueue users in the
kernel. That is fine (and it is its purpose) but it should not be used for
core kernel facilities such as work_on_cpu() - precisely because doing so
would limit that facility's genericity.

This bug was always there but dormant until work_on_cpu() was used from a
deep enough codepath.

So the solution here is to isolate work_on_cpu()s mechanisms from the
'misc' workqueue that schedule_work() deals with - this is what Rusty's
patch does.

Your observation about there being too many workqueue threads is correct
but this commit is IMO a valid use of the workqueue facility. This
workqueue it only gets created on CONFIG_SMP so there cost is about ~10K
RAM per CPU.

I also agree that the commit should have included this description.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/