Re: [RFC v2 07/18] kthread: Allow to cancel kthread work

From: Tejun Heo
Date: Mon Sep 28 2015 - 13:03:59 EST


Hello, Petr.

On Fri, Sep 25, 2015 at 01:26:17PM +0200, Petr Mladek wrote:
> 1) PENDING state plus -EAGAIN/busy loop cycle
> ---------------------------------------------
>
> IMHO, we want to use the timer because it is an elegant solution.
> Then we must release the lock when the timer is running. The lock
> must be taken by the timer->function(). And there is a small window
> when the timer is not longer pending but timer->function is not running:
>
> CPU0 CPU1
>
> run_timer_softirq()
> __run_timers()
> detach_expired_timer()
> detach_timer()
> #clear_pending
>
> try_to_grab_pending_kthread_work()
> del_timer()
> # fails because not pending
>
> test_and_set_bit(KTHREAD_WORK_PENDING_BIT)
> # fails because already set
>
> if (!list_empty(&work->node))
> # fails because still not queued
>
> !!! problematic window !!!
>
> call_timer_fn()
> queue_kthraed_work()

Let's say each work item has a state variable which is protected by a
lock and the state can be one of IDLE, PENDING, CANCELING. Let's also
assume that all cancelers synchronize with each other via mutex, so we
only have to worry about a single canceler. Wouldn't something like
the following work while being a lot simpler?

Delayed queueing and execution.

1. Lock and check whether state is IDLE. If not, nothing to do.

2. Set state to PENDING and schedule the timer and unlock.

3. On expiration, timer_fn grabs the lock and see whether state is
still PENDING. If so, schedule the work item for execution;
otherwise, nothing to do.

4. After dequeueing from execution queue with lock held, the worker is
marked as executing the work item and state is reset to IDLE.

Canceling

1. Lock, dequeue and set the state to CANCELING.

2. Unlock and perform del_timer_sync().

3. Flush the work item.

4. Lock and reset the state to IDLE and unlock.


> 2) CANCEL state plus custom waitqueue
> -------------------------------------
>
> cancel_kthread_work_sync() has to wait for the running work. It might take
> quite some time. Therefore we could not block others by a spinlock.
> Also others could not wait for the spin lock in a busy wait.

Hmmm? Cancelers can synchronize amongst them using a mutex and the
actual work item wait can use flushing.

> IMHO, the proposed and rather complex solutions are needed in both cases.
>
> Or did I miss a possible trick, please?

I probably have missed something in the above and it is not completley
correct but I do think it can be way simpler than how workqueue does
it.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/