Re: [RFC PATCH] slow-work: add (module*)work->owner to fix raceswith module clients
From: Gregory Haskins
Date: Wed Jun 24 2009 - 15:42:58 EST
Gregory Haskins wrote:
> (Applies to Linus' git master:626f380d)
>
> Hi All,
> I found this while working on KVM. I actually posted this patch with
> a KVM
> series yesterday and standalone earlier today, but neither seems to have
> made it to the lists. I suspect there is an issue with git-mail/postfix
> on my system.
>
> I digress. This is a repost with the patch by itself, and rebased to
> Linus' tree instead of kvm.git. Apologies if the system finally
> corrects itself and the others show up.
>
> Thoughts?
>
> Regards,
> -Greg
>
> -----------------------------
>
> slow-work: add (module*)work->owner to fix races with module clients
>
> The slow_work facility was designed to use reference counting instead of
> barriers for synchronization. The reference counting mechanism is
> implemented as a vtable op (->get_ref, ->put_ref) callback. This is
> problematic for module use of the slow_work facility because it is
> impossible
> to synchronize against the .text installed in the callbacks: There is
> no way to ensure that the slow-work threads have completely exited the
> .text in question and rmmod may yank it out from under the slow_work thread.
>
> This patch attempts to address this issue by transparently mapping "struct
> module* owner" to the slow_work item, and maintaining a module reference
> count coincident with the more externally visible reference count. Since
> the slow_work facility is resident in kernel, it should be a race-free
> location to issue a module_put() call. This will ensure that modules
> can properly cleanup before exiting.
>
> A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
> dequeue technically adds the overhead of the atomic operations for every
> work item scheduled. However, slow_work is designed for deferring
> relatively long-running and/or sleepy tasks to begin with, so this
> overhead will hopefully be negligible.
>
> Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>
> CC: David Howells <dhowells@xxxxxxxxxx>
> ---
>
> include/linux/slow-work.h | 4 ++++
> kernel/slow-work.c | 6 ++++++
> 2 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
> index b65c888..9f48dab 100644
> --- a/include/linux/slow-work.h
> +++ b/include/linux/slow-work.h
> @@ -17,6 +17,7 @@
> #ifdef CONFIG_SLOW_WORK
>
> #include <linux/sysctl.h>
> +#include <linux/module.h>
>
> struct slow_work;
>
> @@ -42,6 +43,7 @@ struct slow_work_ops {
> * queued
> */
> struct slow_work {
> + struct module *owner;
> unsigned long flags;
> #define SLOW_WORK_PENDING 0 /* item pending (further) execution */
> #define SLOW_WORK_EXECUTING 1 /* item currently executing */
> @@ -61,6 +63,7 @@ struct slow_work {
> static inline void slow_work_init(struct slow_work *work,
> const struct slow_work_ops *ops)
> {
> + work->owner = THIS_MODULE;
> work->flags = 0;
> work->ops = ops;
> INIT_LIST_HEAD(&work->link);
> @@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work,
> static inline void vslow_work_init(struct slow_work *work,
> const struct slow_work_ops *ops)
> {
> + work->owner = THIS_MODULE;
> work->flags = 1 << SLOW_WORK_VERY_SLOW;
> work->ops = ops;
> INIT_LIST_HEAD(&work->link);
> diff --git a/kernel/slow-work.c b/kernel/slow-work.c
> index 09d7519..1dc3486 100644
> --- a/kernel/slow-work.c
> +++ b/kernel/slow-work.c
> @@ -220,6 +220,8 @@ static bool slow_work_execute(void)
> }
>
> work->ops->put_ref(work);
>
On this front: I also wonder if this put_ref is racing since we cannot
guarantee pointer stability if
the object is kfree'd as a result of dropping the last ref. I do not
know enough about compilers to say whether work or work->ops
invalidation would cause problems with the call-return, but it seems
dangerous at best. An alternative might be to copy the put_ref pointer
prior to the call. Something like
slowwork_putref_t put_ref = work->ops->put_ref;
....
put_ref(work);
might be better. However, I am not sure if it really matters so I did
not address this issue yet.
-Greg
> + barrier(); /* ensure that put_ref is not re-ordered with module_put */
> + module_put(work->owner);
> return true;
>
> auto_requeue:
> @@ -299,6 +301,8 @@ int slow_work_enqueue(struct slow_work *work)
> if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
> set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
> } else {
> + if (!try_module_get(work->owner))
> + goto cant_get_mod;
> if (work->ops->get_ref(work) < 0)
> goto cant_get_ref;
> if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
> @@ -313,6 +317,8 @@ int slow_work_enqueue(struct slow_work *work)
> return 0;
>
> cant_get_ref:
> + module_put(work->owner);
> +cant_get_mod:
> spin_unlock_irqrestore(&slow_work_queue_lock, flags);
> return -EAGAIN;
> }
>
>
>
Attachment:
signature.asc
Description: OpenPGP digital signature