Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume

From: Tejun Heo
Date: Thu Nov 12 2009 - 14:13:53 EST

Next message: Tejun Heo: "Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume"
Previous message: Stephen Clark: "Re: [PATCH v2] Documentation: rw_lock lessons learned"
In reply to: Oleg Nesterov: "Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) onresume"
Next in thread: Rafael J. Wysocki: "Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello, Oleg.

11/13/2009 03:35 AM, Oleg Nesterov wrote:
>> That looks like an excellent idea and I don't think it will add
>> noticeable overhead even done by default and it will magically make
>> the "how to implement single-threaded wq semantics in conccurrency
>> managed wq" problem go away. I'll work on it.
>
> I am still not sure all work_structs should single-threaded by default.
>
> To clarify, I am not arguing. Just I don't know. I mean, this change can
> break the existing code, and it is not easy to notice the problem.

Hmm... I can't think of something which will break if single thread
(or execution instance) is enforced. Are you worrying about ignoring
cpu designation? I'm still working on it but it seems possible to
honor cpu parameter and enforce single thread.

>> making flush_work() behave as
>> flush_work_sync() by default should be doable without too much
>> overhead. I'll give it a shot.
>
> Well, I disagree. Imho it is better to have both flush_work() and
> flush_work_sync(). flush_work() is "special" and should be used with
> care. But this is minor, and if the work_stuct is single-threaded then
> flush_work() == flush_work_sync().
>
> (Perhaps this is what you meant)

Yeap, that was what I meant. :-)

> My only point was, it is not that workqueues are buggy, they were
> designed (and even documented) to work this way. I can't judge if it
> was right or not, but personally I think everything is "logical".

Yes, it's very focused on lowering cross-cpu overhead whereever
possible. I think the one design decision which added most to
complexity and subtlety was allowing work functions to free or reuse
work structs leaving the workqueue code only the pointer value to work
with for synchronization. Maybe it's worth it. I don't know. At any
rate changing it would be too costly at this point.

> That said, I agree that we have too many buggy users, perhaps we
> should simplify the rules.
>
> I just noticed that schedule_on_each_cpu() was recently changed by
>
> HWPOISON: Allow schedule_on_each_cpu() from keventd
> commit: 65a64464349883891e21e74af16c05d6e1eeb4e9
>
> Surprisingly, even this simple change is not exactly right.
>
> /*
> * when running in keventd don't schedule a work item on itself.
> * Can just call directly because the work queue is already bound.
> * This also is faster.
> * Make this a generic parameter for other workqueues?
> */
> if (current_is_keventd()) {
> orig = raw_smp_processor_id();
> INIT_WORK(per_cpu_ptr(works, orig), func);
> func(per_cpu_ptr(works, orig));
> }
>
> OK, but this code should be moved down, under get_online_cpus().

Yeap, I already have the patch in my queue. It will be going out in a
few days.

> Perhaps you and Linus are right, and we should simplify the rules
> unconditionally. But note that the problem above has nothing to do with
> single-threaded behaviour, and I do not think it is possible to guarantee
> work->func() can't be moved to another CPU.

I don't know about Linus but I definitely would like default single
threaded behavior. Currently, singlethread workqueue is used for two
things - to limit the number of workers and to avoid works executing
simultaneously on different cpus but I don't think all users are
careful enough about the second point.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tejun Heo: "Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume"
Previous message: Stephen Clark: "Re: [PATCH v2] Documentation: rw_lock lessons learned"
In reply to: Oleg Nesterov: "Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) onresume"
Next in thread: Rafael J. Wysocki: "Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]