Re: [PATCH 2/2] livepatch: fix patched module loading race

From: Josh Poimboeuf
Date: Wed Mar 04 2015 - 10:34:57 EST

Next message: Евгений Поляков: "Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm"
Previous message: Jiri Kosina: "Re: [PATCH] MAINTAINERS: fix UTF-8 encoding"
In reply to: Petr Mladek: "Re: [PATCH 2/2] livepatch: fix patched module loading race"
Next in thread: Jiri Kosina: "Re: [PATCH 2/2] livepatch: fix patched module loading race"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Mar 04, 2015 at 02:17:52PM +0100, Petr Mladek wrote:
> On Tue 2015-03-03 17:02:22, Josh Poimboeuf wrote:
> > It's possible for klp_register_patch() to see a module before the COMING
> > notifier is called, or after the GOING notifier is called.
> >
> > That can cause all kinds of ugly races. As Pter Mladek reported:
> >
> > "The problem is that we do not keep the klp_mutex lock all the time when
> > the module is being added or removed.
> >
> > First, the module is visible even before ftrace is ready. If we enable a patch
> > in this time frame, adding ftrace ops will fail and the patch will get rejected
> > just because bad timing.
>
> Ah, this is not true after all. I did not properly check when
> MODULE_STATE_COMING was set. I though that it was before ftrace was
> initialized but it was not true.

Ah, right. I didn't re-read your description after realizing that
MODULE_STATE_COMING means ftrace is already initialized.

> > Second, if we are "lucky" and enable the patch for the coming module when the
> > ftrace is ready but before the module notifier has been called. The notifier
> > will try to enable the patch as well. It will detect that it is already patched,
> > return error, and the module will get rejected just because bad
> > timing. The more serious problem is that it will not call the notifier for
> > going module, so that the mess will stay there and we wont be able to load
> > the module later.
>
> Ah, the race is there but the effect is not that serious in the
> end. It seems that errors from module notifiers are ignored. In fact,
> we do not propagate the error from klp_module_notify_coming(). It means
> that WARN() from klp_enable_object() will be printed but the module
> will be loaded and patched.
>
> I am sorry, I was confused by kGraft where kgr_module_init() was
> called directly from module_load(). The errors were propagated. It
> means that kGraft rejects module when the patch cannot be applied.

True, but we should still eliminate the race. Initializing the object
twice could cause some sneaky bugs, if not now then in the future.

> Note that the current solution is perfectly fine for the simple
> consistency model.
>
>
> > Third, similar problems are there for going module. If a patch is enabled after
> > the notifier finishes but before the module is removed from the list of modules,
> > the new patch will be applied to the module. The module might disappear at
> > anytime when the patch enabling is in progress, so there might be an access out
> > of memory. Or the whole patch might be applied and some mess will be left,
> > so it will not be possible to load/patch the module again."
>
> This is true.
>
>
> > Fix these races by letting the first one who sees the module do the
> > needed work.
> >
> > Reported-by: Petr Mladek <pmladek@xxxxxxx>
> > Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > ---
> > kernel/livepatch/core.c | 53 +++++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 49 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > index a74e4e8..19a758c 100644
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -89,16 +89,42 @@ static bool klp_is_object_loaded(struct klp_object *obj)
> > /* sets obj->mod if object is not vmlinux and module is found */
> > static void klp_find_object_module(struct klp_object *obj)
> > {
> > + struct module *mod;
> > +
> > if (!klp_is_module(obj))
> > return;
> >
> > mutex_lock(&module_mutex);
> > +
> > /*
> > * We don't need to take a reference on the module here because we have
> > * the klp_mutex, which is also taken by the module notifier. This
> > * prevents any module from unloading until we release the klp_mutex.
> > */
> > - obj->mod = find_module(obj->name);
> > + mod = find_module(obj->name);
> > +
> > + /*
> > + * Be careful here to prevent races with the notifier:
> > + *
> > + * - MODULE_STATE_COMING: This may be before or after the notifier gets
> > + * called. If after, the notifier didn't see the object anyway
> > + * because it hadn't been added to the patch list yet. Either way,
> > + * ftrace is already initialized, so it's safe to just go ahead and
> > + * initialize the object here.
>
> Well, we need to be careful. This will handle only the newly
> registered patch. If there are other patches against the module
> on the stack, it might produce wrong order at func_stack, see below.

Sorry, but I don't follow. Can you give an example?

> > + *
> > + * - MODULE_STATE_GOING: Similar to MODULE_STATE_COMING, this may be
> > + * before or after the notifier gets called. If after, the notifer
> > + * didn't see the object anyway. Either way it's safe to just
> > + * pretend that it's already gone and leave obj->mod at NULL.
>
> This is true for the current simple consistency model.
>
> Note that there will be a small race window if we allow dependency
> between patched functions and introduce more a complex consistency model
> with lazy patching. The problem is the following sequence:
>
>
> CPU0 CPU1
>
> delete_module() #SYSCALL
>
> try_stop_module()
> mod->state = MODULE_STATE_GOING;
>
> mutex_unlock(&module_mutex);
>
> klp_register_patch()
> klp_enable_patch()
>
> #save place to switch universe
>
> b() # from module that is going
> a() # from core (patched)
>
>
> mod->exit();
>
>
> Note that the function b() can be called until we call mod->exit().
>
> If we do not apply patch against b() because it is in
> MODULE_STATE_GOING. It will call patched a() with modified semantic
> and things might get wrong.
>
>
> Well, the above described race is rather theoretical. It will happen
> only when a patched module is being removed and a module with a patch
> is added at the same time. Also it means that some other CPU will
> manage to register, enable the patch, switch the universe, and
> call function from the patched module during the small race window.
>
> I am not sure if we need to handle such a type of race if the solution
> adds too big complexity.

But b() can't be called after the module is in MODULE_STATE_GOING,
right? try_stop_module() makes sure it's not being used before changing
its state.

> > + * MODULE_STATE_LIVE: The common case. The module already finished
> > + * loading. Just like with MODULE_STATE_COMING, we know the notifier
> > + * didn't see the object, so we init it here.
> > + */
> > + if (mod && (mod->state == MODULE_STATE_COMING ||
> > + mod->state == MODULE_STATE_LIVE))
> > + obj->mod = mod;
> > +
> > mutex_unlock(&module_mutex);
> > }
> >
> > @@ -695,8 +721,6 @@ static void klp_free_object_loaded(struct klp_object *obj)
> > {
> > struct klp_func *func;
> >
> > - obj->mod = NULL;
> > -
> >
> > for (func = obj->funcs; func->old_name; func++)
> > func->old_addr = 0;
> > }
> > @@ -765,6 +789,7 @@ static int klp_init_object(struct klp_patch *patch, struct klp_object *obj)
> > return -EINVAL;
> >
> > obj->state = KLP_DISABLED;
> > + obj->mod = NULL;
> >
> > klp_find_object_module(obj);
> >
> > @@ -965,10 +990,30 @@ static int klp_module_notify(struct notifier_block *nb, unsigned long action,
> > continue;
> >
> > if (action == MODULE_STATE_COMING) {
> > +
> > + /*
> > + * Check for a small window where the register
> > + * path already initialized the object.
> > + */
> s/path/patch/

Not a typo, I was referring to the register code path. Is there a
clearer way to say that?

>
>
>
> > + if (obj->mod)
> > + continue;
>
> This might break stacking. The recently registered patch might become
> the last on the stack and thus unused.

Example please :-)

> > obj->mod = mod;
> > klp_module_notify_coming(patch, obj);
> > - } else /* MODULE_STATE_GOING */
> > + } else {
> > + /* MODULE_STATE_GOING */
> > +
> > + /*
> > + * Check for a small window where the register
> > + * path already saw the GOING state and thus
> > + * didn't set obj->mod.
>
> Same typo here.

Same not actually a typo ;-)

> Also I would write that it did not initialize the object. It is not
> only about setting obj->mod.

Ok.

--
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Евгений Поляков: "Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm"
Previous message: Jiri Kosina: "Re: [PATCH] MAINTAINERS: fix UTF-8 encoding"
In reply to: Petr Mladek: "Re: [PATCH 2/2] livepatch: fix patched module loading race"
Next in thread: Jiri Kosina: "Re: [PATCH 2/2] livepatch: fix patched module loading race"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]