Re: [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility

From: Tejun Heo
Date: Wed Sep 06 2017 - 10:51:52 EST


Hello, David.

On Tue, Sep 05, 2017 at 03:50:16PM +0100, David Howells wrote:
> With one of my latest patches to AFS, there's a set of cell records, where
> each cell has a manager work item that mainains that cell, including
> refreshing DNS records and excising expired records from the list. Performing
> the excision in the manager work item makes handling the fscache index cookie
> easier (you can't have two cookies attached to the same object), amongst other
> things.
>
> There's also an overseer work item that maintains a single expiry timer for
> all the cells and queues the per-cell work items to do DNS updates and cell
> removal.
>
> The reason that the overseer exists is that it makes it easier to do a put on
> a cell. The cell decrements the cell refcount and then wants to schedule the
> cell for destruction - but it's no longer permitted to touch the cell. I
> could use atomic_dec_and_lock(), but that's messy. It's cleaner just to set
> the timer on the overseer and leave it to that.
>
> However, if someone does rmmod, I have to be able to clean everything up. The
> overseer timer may be queued or running; the overseer may be queued *and*
> running and may get queued again by the timer; and each cell's work item may
> be queued *and* running and may get queued again by the manager.

Thanks for the detailed explanation.

> > Why can't it be done via the usual "flush from exit"?
>
> Well, it can, but you need a flush for each separate level of dependencies,
> where one dependency will kick off another level of dependency during the
> cleanup.
>
> So what I think I would have to do is set a flag to say that no one is allowed
> to set the timer now (this shouldn't happen outside of server or volume cache
> clearance), delete the timer synchronously, flush the work queue four times
> and then do an RCU barrier.
>
> However, since I have volumes with dependencies on servers and cells, possibly
> with their own managers, I think I may need up to 12 flushes, possibly with
> interspersed RCU barriers.

Would it be possible to isolate work items for the cell in its own
workqueue and use drain_workqueue()? Separating out flush domains is
one of the main use cases for dedicated workqueues after all.

> It's much simpler to count out the objects than to try and get the flushing
> right.

I still feel very reluctant to add generic counting & trigger
mechanism to work items for this. I think it's too generic a solution
for a very specific problem.

Thanks.

--
tejun