Re: Re: Re: [PATCH 2/2] livepatch: fix patched module loading race
From: Petr Mladek
Date: Fri Mar 06 2015 - 08:05:25 EST
On Fri 2015-03-06 20:37:26, Masami Hiramatsu wrote:
> (2015/03/06 19:51), Petr Mladek wrote:
> > On Fri 2015-03-06 10:24:27, Masami Hiramatsu wrote:
> >> (2015/03/05 23:18), Josh Poimboeuf wrote:
> >>> On Thu, Mar 05, 2015 at 09:52:41AM +0900, Masami Hiramatsu wrote:
> >>>> (2015/03/04 22:17), Petr Mladek wrote:
> >>>>> On Tue 2015-03-03 17:02:22, Josh Poimboeuf wrote:
> >>>>>> It's possible for klp_register_patch() to see a module before the COMING
> >>>>>> notifier is called, or after the GOING notifier is called.
> >>>>>>
> >>>>>> That can cause all kinds of ugly races. As Pter Mladek reported:
> >>>>>>
> >>>>>> "The problem is that we do not keep the klp_mutex lock all the time when
> >>>>>> the module is being added or removed.
> >>>>>>
> >>>>>> First, the module is visible even before ftrace is ready. If we enable a patch
> >>>>>> in this time frame, adding ftrace ops will fail and the patch will get rejected
> >>>>>> just because bad timing.
> >>>>>
> >>>>> Ah, this is not true after all. I did not properly check when
> >>>>> MODULE_STATE_COMING was set. I though that it was before ftrace was
> >>>>> initialized but it was not true.
> >>>>>
> >>>>>
> >>>>>> Second, if we are "lucky" and enable the patch for the coming module when the
> >>>>>> ftrace is ready but before the module notifier has been called. The notifier
> >>>>>> will try to enable the patch as well. It will detect that it is already patched,
> >>>>>> return error, and the module will get rejected just because bad
> >>>>>> timing. The more serious problem is that it will not call the notifier for
> >>>>>> going module, so that the mess will stay there and we wont be able to load
> >>>>>> the module later.
> >>>>>
> >>>>> Ah, the race is there but the effect is not that serious in the
> >>>>> end. It seems that errors from module notifiers are ignored. In fact,
> >>>>> we do not propagate the error from klp_module_notify_coming(). It means
> >>>>> that WARN() from klp_enable_object() will be printed but the module
> >>>>> will be loaded and patched.
> >>>>>
> >>>>> I am sorry, I was confused by kGraft where kgr_module_init() was
> >>>>> called directly from module_load(). The errors were propagated. It
> >>>>> means that kGraft rejects module when the patch cannot be applied.
> >>>>>
> >>>>> Note that the current solution is perfectly fine for the simple
> >>>>> consistency model.
> >>>>>
> >>>>>
> >>>>>> Third, similar problems are there for going module. If a patch is enabled after
> >>>>>> the notifier finishes but before the module is removed from the list of modules,
> >>>>>> the new patch will be applied to the module. The module might disappear at
> >>>>>> anytime when the patch enabling is in progress, so there might be an access out
> >>>>>> of memory. Or the whole patch might be applied and some mess will be left,
> >>>>>> so it will not be possible to load/patch the module again."
> >>>>>
> >>>>> This is true.
> >>>>
> >>>> No, that's not true if you try_get_module() before patching. After the
> >>>> module state goes GOING (more correctly say, after try_release_module_ref()
> >>>> succeeded), all try_get_module() must fail :)
> >>>> So, please make sure to get module when applying patches.
> >>>
> >>> Hi Masami,
> >>>
> >>> As Jikos pointed out elsewhere, try_get_module() won't solve all the
> >>> GOING races.
> >>>
> >>> The module can be in GOING before mod->exit() is called. If we apply a
> >>> patch between GOING getting set and mod->exit(), try_module_get() will
> >>> fail and the module won't be patched. But module code can still run
> >>> before or during mod->exit(), so the unpatched module code might
> >>> interact badly with new patched code elsewhere.
> >>
> >> Hmm, in that case, we'd better have new GONE state for the module.
> >> At least kprobe needs it.
> >
> > What is the exact problem with kprobes, please?
>
> Ah, sorry, I miss understood the issue.
>
>
> > Note that the notifiers for MODULE_STATE_GOING are called
> > after mod->exit(). Therefore it is safe to kill kprobes
> > the fast way when using the notifier.
>
> Right.
>
> /* Stop the machine so refcounts can't move and disable module. */
> ret = try_stop_module(mod, flags, &forced);
> if (ret != 0)
> goto out;
>
> mutex_unlock(&module_mutex);
> /* Final destruction now no one is using it. */
> if (mod->exit != NULL)
> mod->exit();
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> async_synchronize_full();
>
> And, kprobes also can try to put a probe between mutex_unlock()
> and mod->exit() on mod->exit() function.
> Since kprobe tries to get module when registering, it will
> fail but mod->exit() is called after that fail.
>
> So, there is also a race.
I think that it is not a problem if the Kprobe can not be registered
at this stage. If anyone wants to register Kprobe on the mod->exit(),
she should do it earlier.
> However, I'm not sure that is actual serious issue.
> Actually, we can suppose this module unloading context is
> not changing universe. thus it is expected behavior, isn't it?
Live Patching is in different situation. The patch modifies many
functions at once. If we allow semantic changes in functions, we need
to have the patches applied consistently on all code that might get
called, including going module. The potential race is described at
https://lkml.org/lkml/2015/3/4/776. See the middle of the message.
Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/