Re: [PATCH 1/7] static_key: flush rate limit timer on rmmod

From: Radim KrÄmÃÅ
Date: Fri Oct 18 2013 - 03:27:07 EST


2013-10-17 12:35+0200, Paolo Bonzini:
> Il 17/10/2013 12:10, Radim KrÄmÃÅ ha scritto:
> > Fix a bug when we free module memory while timer is pending by marking
> > deferred static keys and flushing the timer on module unload.
> >
> > Also make static_key_rate_limit() useable more than once.
> >
> > Reproducer: (host crasher)
> > modprobe kvm_intel
> > (sleep 1; echo quit) \
> > | qemu-kvm -kernel /dev/null -monitor stdio &
> > sleep 0.5
> > until modprobe -rv kvm_intel 2>/dev/null; do true; done
> > modprobe -v kvm_intel
> >
> > Signed-off-by: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> > ---
> > Very hacky; I've already queued generalizing ratelimit and applying it
> > here, but there is still a lot to do on static keys ...
> >
> > include/linux/jump_label.h | 1 +
> > kernel/jump_label.c | 17 ++++++++++++++++-
> > 2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> > index a507907..848bd15 100644
> > --- a/include/linux/jump_label.h
> > +++ b/include/linux/jump_label.h
> > @@ -58,6 +58,7 @@ struct static_key {
> > #ifdef CONFIG_MODULES
> > struct static_key_mod *next;
> > #endif
> > + atomic_t deferred;
> > };
> >
> > # include <asm/jump_label.h>
> > diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> > index 297a924..7018042 100644
> > --- a/kernel/jump_label.c
> > +++ b/kernel/jump_label.c
> > @@ -116,8 +116,9 @@ EXPORT_SYMBOL_GPL(static_key_slow_dec_deferred);
> > void jump_label_rate_limit(struct static_key_deferred *key,
> > unsigned long rl)
> > {
> > + if (!atomic_xchg(&key->key.deferred, 1))
> > + INIT_DELAYED_WORK(&key->work, jump_label_update_timeout);
>
> Can it actually happen that jump_label_rate_limit is called multiple
> times? If so, this hunk alone would be a separate bugfix. I don't
> think all the concurrency that you're protecting against can actually
> happen, but in any case I'd just take the jump_label_lock() instead of
> using atomics.

It can't happen in current code and it is highly unlikely to happen in
future too.

There was no reason to take the lock, so I didn't, but we could use bool
in struct then ... I'll do it, even though it has more lines of code, it
is probably easier to understand.

> It's also not necessary to use a new field, since you can just check
> key->timeout.

The flush is done automatically and we don't know if the jump_entry
belongs to deferred key, so we shouldn't just blindly try.
(another bit to jump_entry flags would supply enough information, but we
haven't decided if we want to optimize them into pointers and there
isn't much space in them + they were introduced in patch [5/7])

> All this gives something like this for static_key_rate_limit_flush:
>
> if (key->timeout) {
> jump_label_lock();
> if (key->enabled) {
> jump_label_unlock();
> flush_delayed_work(&dkey->work);
> } else
> jump_label_unlock();
> }

Ugh, I see a problem in original patch now: I changed it from
cancel_delayed_work() in the module that owns this key shortly before
posting, because it could still bug then and forgot it isn't good to
take jump_label_lock() a second time, which would be done in the flush.

This needs be solved by checking if we are the last module that uses
this key and issuing a cancel() then and I'm not sure it would not still
bug yet -- the work could already be running, just waiting for
jump_label_lock() we would then somehow manage to free the memory first.

(leaving it to programmer starts to look sane ...)

> Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/