Re: [PATCH v4] slow-work: add (module*)work->ops->owner to fixraces with module clients
From: steve
Date: Tue Jun 30 2009 - 06:12:50 EST
Hi,
On Tue, Jun 30, 2009 at 12:07:15PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 30, 2009 at 09:43:03AM +0100, Steven Whitehouse wrote:
> > Hi,
> >
> > I'm happy to ACK this, but the race doesn't exist in GFS2's case because
> > we wait for all work related to each GFS2 fs at umount time and the
> > module unload cannot happen until all GFS2 fs are umounted,
> >
> > Steve.
>
> I wonder whether the following holds:
>
> static void gfs2_recover_put_ref(struct slow_work *work)
> {
> struct gfs2_jdesc *jd = container_of(work, struct gfs2_jdesc, jd_work);
> clear_bit(JDF_RECOVERY, &jd->jd_flags);
> smp_mb__after_clear_bit();
> wake_up_bit(&jd->jd_flags, JDF_RECOVERY);
>
> <- umount can complete here?
>
> }
>
>
> If yes, .text of the module could go away between the point marked by <-
> and return from gfs2_recover_put_ref.
>
>
Well in theory, yes. In reality I don't think it could ever happen
as the remainder of the umount code is quite lengthy and it would
require the gap between the clearing of the recovery bit and the
return from that function to take time in the order of seconds which
is (I hope) relatively unlikely. That depends upon the number of
cached glocks, etc. so it can be longer than that for very large fs.
The module unload for GFS2 also has a number of items to stop and/or
free, including the slow_work instance itself, so that also adds to
the safety margin.
Steve.
> > On Mon, 2009-06-29 at 15:24 -0400, Gregory Haskins wrote:
> > > (Applies to Linus' linux-2.6.git/master:626f380d)
> > >
> > > [ Changelog:
> > >
> > > v4:
> > > *) added ".owner = THIS_MODULE" fields to all current slow-work
> > > clients (fscache, gfs2).
> > >
> > > v3:
> > > *) moved (module*)owner to slow_work_ops
> > > *) removed useless barrier()
> > > *) updated documentation/comments
> > >
> > > v2:
> > > *) cache "owner" value to prevent invalid access after put_ref
> > >
> > > v1:
> > > *) initial release
> > > ]
> > >
> > > I've retained Michael's "Reviewed-by:" tag from v3 since v4 is identical
> > > in every way except the new hunks added to gfs2/fscache that he asked for.
> > >
> > > Michael, if you want to recind your tag, please speak up.
> > >
> > > Otherwise, please consider for inclusion.
> > >
> > > Regards,
> > > -Greg
> > >
> > > ---------------------------------
> > >
> > > slow-work: add (module*)work->ops->owner to fix races with module clients
> > >
> > > The slow_work facility was designed to use reference counting instead of
> > > barriers for synchronization. The reference counting mechanism is
> > > implemented as a vtable op (->get_ref, ->put_ref) callback. This is
> > > problematic for module use of the slow_work facility because it is
> > > impossible to synchronize against the .text installed in the callbacks:
> > > There is no way to ensure that the slow-work threads have completely
> > > exited the .text in question and rmmod may yank it out from under the
> > > slow_work thread.
> > >
> > > This patch attempts to address this issue by mapping "struct module* owner"
> > > to the slow_work_ops item, and maintaining a module reference
> > > count coincident with the more externally visible reference count. Since
> > > the slow_work facility is resident in kernel, it should be a race-free
> > > location to issue a module_put() call. This will ensure that modules
> > > can properly cleanup before exiting.
> > >
> > > A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
> > > dequeue technically adds the overhead of the atomic operations for every
> > > work item scheduled. However, slow_work is designed for deferring
> > > relatively long-running and/or sleepy tasks to begin with, so this
> > > overhead will hopefully be negligible.
> > >
> > > Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>
> > > Reviewed-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > > CC: David Howells <dhowells@xxxxxxxxxx>
> > > CC: Steven Whitehouse <swhiteho@xxxxxxxxxx>
> > > ---
> > >
> > > Documentation/slow-work.txt | 6 +++++-
> > > fs/fscache/object.c | 1 +
> > > fs/fscache/operation.c | 1 +
> > > fs/gfs2/recovery.c | 1 +
> > > include/linux/slow-work.h | 3 +++
> > > kernel/slow-work.c | 20 +++++++++++++++++++-
> > > 6 files changed, 30 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
> > > index ebc50f8..2a38878 100644
> > > --- a/Documentation/slow-work.txt
> > > +++ b/Documentation/slow-work.txt
> > > @@ -80,6 +80,7 @@ Slow work items may then be set up by:
> > > (2) Declaring the operations to be used for this item:
> > >
> > > struct slow_work_ops myitem_ops = {
> > > + .owner = THIS_MODULE,
> > > .get_ref = myitem_get_ref,
> > > .put_ref = myitem_put_ref,
> > > .execute = myitem_execute,
> > > @@ -102,7 +103,10 @@ A suitably set up work item can then be enqueued for processing:
> > > int ret = slow_work_enqueue(&myitem);
> > >
> > > This will return a -ve error if the thread pool is unable to gain a reference
> > > -on the item, 0 otherwise.
> > > +on the item, 0 otherwise. Loadable modules may only enqueue work if at least
> > > +one reference to the module is known to be held. The slow-work infrastructure
> > > +will acquire a reference to the module and hold it until after the item's
> > > +reference is dropped, assuring the stability of the callback.
> > >
> > >
> > > The items are reference counted, so there ought to be no need for a flush
> > > diff --git a/fs/fscache/object.c b/fs/fscache/object.c
> > > index 392a41b..d236eb1 100644
> > > --- a/fs/fscache/object.c
> > > +++ b/fs/fscache/object.c
> > > @@ -45,6 +45,7 @@ static void fscache_enqueue_dependents(struct fscache_object *);
> > > static void fscache_dequeue_object(struct fscache_object *);
> > >
> > > const struct slow_work_ops fscache_object_slow_work_ops = {
> > > + .owner = THIS_MODULE,
> > > .get_ref = fscache_object_slow_work_get_ref,
> > > .put_ref = fscache_object_slow_work_put_ref,
> > > .execute = fscache_object_slow_work_execute,
> > > diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c
> > > index e7f8d53..f1a2857 100644
> > > --- a/fs/fscache/operation.c
> > > +++ b/fs/fscache/operation.c
> > > @@ -453,6 +453,7 @@ static void fscache_op_execute(struct slow_work *work)
> > > }
> > >
> > > const struct slow_work_ops fscache_op_slow_work_ops = {
> > > + .owner = THIS_MODULE,
> > > .get_ref = fscache_op_get_ref,
> > > .put_ref = fscache_op_put_ref,
> > > .execute = fscache_op_execute,
> > > diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> > > index 59d2695..0c2a6aa 100644
> > > --- a/fs/gfs2/recovery.c
> > > +++ b/fs/gfs2/recovery.c
> > > @@ -593,6 +593,7 @@ fail:
> > > }
> > >
> > > struct slow_work_ops gfs2_recover_ops = {
> > > + .owner = THIS_MODULE,
> > > .get_ref = gfs2_recover_get_ref,
> > > .put_ref = gfs2_recover_put_ref,
> > > .execute = gfs2_recover_work,
> > > diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
> > > index b65c888..1382918 100644
> > > --- a/include/linux/slow-work.h
> > > +++ b/include/linux/slow-work.h
> > > @@ -17,6 +17,7 @@
> > > #ifdef CONFIG_SLOW_WORK
> > >
> > > #include <linux/sysctl.h>
> > > +#include <linux/module.h>
> > >
> > > struct slow_work;
> > >
> > > @@ -24,6 +25,8 @@ struct slow_work;
> > > * The operations used to support slow work items
> > > */
> > > struct slow_work_ops {
> > > + struct module *owner;
> > > +
> > > /* get a ref on a work item
> > > * - return 0 if successful, -ve if not
> > > */
> > > diff --git a/kernel/slow-work.c b/kernel/slow-work.c
> > > index 09d7519..18dee34 100644
> > > --- a/kernel/slow-work.c
> > > +++ b/kernel/slow-work.c
> > > @@ -145,6 +145,15 @@ static unsigned slow_work_calc_vsmax(void)
> > > return min(vsmax, slow_work_max_threads - 1);
> > > }
> > >
> > > +static void slow_work_put(struct slow_work *work)
> > > +{
> > > + /* cache values that are needed during/after pointer invalidation */
> > > + struct module *owner = work->ops->owner;
> > > +
> > > + work->ops->put_ref(work);
> > > + module_put(owner);
> > > +}
> > > +
> > > /*
> > > * Attempt to execute stuff queued on a slow thread. Return true if we managed
> > > * it, false if there was nothing to do.
> > > @@ -219,7 +228,7 @@ static bool slow_work_execute(void)
> > > spin_unlock_irq(&slow_work_queue_lock);
> > > }
> > >
> > > - work->ops->put_ref(work);
> > > + slow_work_put(work);
> > > return true;
> > >
> > > auto_requeue:
> > > @@ -299,6 +308,14 @@ int slow_work_enqueue(struct slow_work *work)
> > > if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
> > > set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
> > > } else {
> > > + /*
> > > + * Callers must ensure that their module has at least
> > > + * one reference held while the work is enqueued. We
> > > + * will acquire another reference here and drop it
> > > + * once we do the last ops->put_ref()
> > > + */
> > > + __module_get(work->ops->owner);
> > > +
> > > if (work->ops->get_ref(work) < 0)
> > > goto cant_get_ref;
> > > if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
> > > @@ -313,6 +330,7 @@ int slow_work_enqueue(struct slow_work *work)
> > > return 0;
> > >
> > > cant_get_ref:
> > > + module_put(work->ops->owner);
> > > spin_unlock_irqrestore(&slow_work_queue_lock, flags);
> > > return -EAGAIN;
> > > }
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/