Re: [PATCH] make-cancel_rearming_delayed_work-reliable-fix

From: Oleg Nesterov
Date: Tue May 08 2007 - 08:03:30 EST


On 05/08, Jarek Poplawski wrote:
>
> On Mon, May 07, 2007 at 02:34:20PM +0400, Oleg Nesterov wrote:
> > On 05/07, Jarek Poplawski wrote:
> ...
> > I am not happy with the complication this patch adds, mostly
> > I hate this smb_wmb() in insert_work(). I have an idea how to
> > remove it later, but this needs another patch not related to
> > workqueue.c.
>
> Probably I don't feel those barriers enough, but IMHO,
> there is something wrong

As I said, I hate this barrier (and new complications) too.

We can make the things a bit better:

- introduce smp_mb__before_spin_lock() // noop on x86

- then, after some minimal code changes, we have

set_wq_data();
smp_mb__before_spin_lock();
spin_lock_irqsave(&cwq->lock);
list_add();
spin_unlock_irqrestore();

> if they are needed here, with all this made per cpu.

CPU-hotplug can change CPU.

> >
> > > BTW, I'm still not convinced all additions are needed:
> > > the "old" cancel_rearming_ doesn't care about checking
> > > or waiting on anything after del_timer positive.
> >
> > It would be very strange to do wait_on_work() only in case
> > when del_timer() failed. This way we still need to do
> > cancel_work_sync() after cancel_rearming_delayed_work(),
> > but only when del_timer() failed, ugly. Note also that
> > wait_on_work() does not sleep if work->func() is not running.
> >
> > Also, consider this callback:
> >
> > void work_handler(struct work_struct *w)
> > {
> > struct delayed_work dw = container_of(...);
> >
> > queue_delayed_work(dw, delay);
> >
> > // <------------- cancel_rearming_delayed_work()
> >
> > cancel_delayed_work(dw);
> > queue_delayed_work(dw, another_delay);
> > }
> >
> > Yes, this is strange and ugly. But correct! The current version
> > (before this patch) can't cancel this delayed_work. The new
> > implementation works correctly. So I think it is far better to
> > do wait_on_work() unconditionally.
>
> I think there are possible many curious, but correct things yet,
> e.g. handler, which fires timer or another work, which rearms
> this work... But maybe it would be resonable to try to separate
> new things which are really necessary, so IMHO, (practical)
> elimination of endless or excessive looping. If there is needed
> some more functionality, maybe there should be second version
> added e.g. cancel_rearming_delayed_work_sync(). Without this
> you risk, people would really think the old version was better,
> if you knew what you were doing. And the worst thing possible:
> they'll start to use their own, lighter solutions.

My goal is just rename cancel_rearming_delayed_work() to
cancel_delayed_work_sync(), because now it doesn't require that @dwork
re-arms itself, and it doesn't require to do cancel_work_sync() after.
cancel_rearming_delayed_work() will be obsolete, just like
cancel_rearming_delayed_workqueue().

I am strongly against adding many different variants of cancel_xxx().
With this patch the API is simple

- cancel_delayed_work() stops the timer

- cancel_rearming_delayed_work() stops the timer and work,
doesn't need cancel_work_sync/flush_workqueue

> I think, there is no reason, such function should need more
> than one loop of one cpu workqueue to cancel any "normal" work
> (or maybe two - if locking is spared), and it's really hard
> to understand, why anybody should wait all those "not mine",
> so probably slow and buggy works in a queue, do their part of
> needless locking/sleeping/looping and let "my" excellent
> driver to be reloaded... And I think you are doing this now
> with some reserves (kind of "with one hand") - because the
> _PENDING flag is overloaded here and one flag still free.
> I really cannot see how sparing one, two or even three flag
> checks during a loop could be worth even one run_workqueue
> loop more without them.

When work->func() re-arms itself, both queue_work() and queue_delayed_work()
may change CPU if CPU_DOWN_xxx happens. Note that this is possible
even if we use freezer for cpu-hotplug, because the timer migrates to
another CPU anyway.

> Yesterday I've written something probably against my intention:
> so, if there is any CPU burning place, which could be fixed,
> it's definitely worth fixing.

Perhaps yes, but see below.

> I'm not sure what exactly place
> did you mean - if spinlocking in wait_on_work - maybe it's
> a sign this place isn't optimal too: it seems to me, this all
> inserting/waiting for completion could be done under existing
> locks e.g. in run_workqueue (maybe with some flag?).

Sorry, can't understand this. Inserting uses existing lock, namely
cwq->lock.

Just in case, I think wait_on_cpu_work() can check ->current_work without
cwq->lock, but I am not sure yet. We can remove unneeded "new |= " from
set_wq_data(), we can do a couple of other optimizations. However, there
are already _a lot_ of workqueue changes in -mm tree. We can do this later.
Right now my only concern is correctness.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/