Re: [RFC v2][PATCH] create workqueue threads only when needed

From: Oleg Nesterov
Date: Tue Jan 27 2009 - 22:05:24 EST


On 01/28, Frederic Weisbecker wrote:
>
> +static void workqueue_unshadow(struct cpu_workqueue_struct *cwq)
> +{
> + struct workqueue_shadow *ws;
> +
> + /* Prevent from concurrent unshadowing */
> + if (unlikely(atomic_inc_return(&cwq->unshadowed) != 1))
> + goto already_unshadowed;
> +
> + /*
> + * The work can be inserted whatever is the context.
> + * But such atomic allocation will be rare and freed soon.
> + */
> + ws = kmalloc(sizeof(*ws), GFP_ATOMIC);
> + if (!ws) {
> + WARN_ON_ONCE(1);
> + goto already_unshadowed;
> + }
> + INIT_DELAYED_WORK(&ws->work, workqueue_unshadow_work);
> + ws->cwq = cwq;
> + schedule_delayed_work(&ws->work, 0);
> +
> + return;
> +
> +already_unshadowed:
> + atomic_dec(&cwq->unshadowed);
> +}

Can't understand why do you use delayed work...

I must admit, I don't like this patch. Perhaps I am wrong, mostly I
dislike the complications it adds.

Anybody else please vote for this change?

Hmm. We never reset cwq->unshadowed, so cwq->thread becomes "non-lazy"
after cpu_down() + cpu_up().

And. Of course it is not good that queue_work() can silently fail just
because GFP_ATOMIC fails. This is not acceptable, imho. But fixable.

What is not fixable is that this patch adds a subtle lock-ordering
problem. With this patch any flush_work() or flush_workqueue() or
destroy_workqueue() depend on keventd, and can deadlock if the caller
shares the lock with any work_struct on keventd.

Or. let's suppose keventd has a sleeping work_struct which waits
for the event. Now we queue the work which should "implement"
this event on !unshadowed wq - deadlock.

Another problem. workqueue_unshadow_work() populates cwq->thread and
binds it to smp_processor_id(). This is not safe, CPU can go away
after smp_processor_id() but before wake_up_process().

Oh, and schedule_delayed_work() is not right, think about queue_work_on().

> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
> {
> + DEFINE_WAIT(wait);
> + long timeout = 0;
> + int unshadowed = atomic_read(&cwq->unshadowed);
> +
> + /* Shadowed => no thread has been created */
> + if (!unshadowed)
> + return;

This is not right, if the previous workqueue_unshadow() failed, we
can return with the pending works.

> +
> /*
> - * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
> - * cpu_add_remove_lock protects cwq->thread.
> + * If it's unshadowed, we want to ensure the thread creation
> + * has been completed.
> */
> - if (cwq->thread == NULL)
> - return;
> + prepare_to_wait(&cwq->thread_creation, &wait, TASK_UNINTERRUPTIBLE);
> + if (!cwq->thread)
> + timeout = schedule_timeout_interruptible(HZ * 3);
> + finish_wait(&cwq->thread_creation, &wait);
> +
> + /* We waited for 3 seconds, this is likely a soft lockup */
> + WARN_ON(timeout);

Can't understand... If timeout != 0, then we were woken by
workqueue_unshadow_work() ?

Anyway. We should not proceed if we failed to create cwq->thread.
The kernel can crash. And of course this is not good too. Yes,
you modified flush_cpu_workqueue() to call workqueue_unshadow(),
but this can fail too. And if another thread cancels the pending
works, flush_cpu_workqueue() just returns, and we crash. Or we
can hang forever.

Also. Please note that cleanup_workqueue_thread() can also be
called by CPU_UP_CANCELED when cwq->thread == NULL because it
was never created. We should do nothing in this case, but we
will hang if cwq->unshadowed != 0.

> switch (action) {
> case CPU_UP_PREPARE:
> + /* Will be created during the first work insertion */
> + if (!atomic_read(&cwq->unshadowed))
> + break;
> if (!create_workqueue_thread(cwq, cpu))
> break;
> printk(KERN_ERR "workqueue [%s] for %i failed\n",
> @@ -964,6 +1086,8 @@ undo:
> goto undo;
>
> case CPU_ONLINE:
> + if (!atomic_read(&cwq->unshadowed))
> + break;
> start_workqueue_thread(cwq, cpu);
> break;

Suppose that we have some strange cpu_callback(action, cpu)
which does:

case CPU_UP_PREPARE:
queue_work_on(cpu, my_wq, percpu_work);
break;
case CPU_UP_CANCELED:
cancel_work_sync(percpu_work);

Currently this works. But with this patch, queue_work_on() above
can leak workqueue_shadow.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/