Re: Re: Re: [PATCH 2/2] livepatch: fix patched module loading race

From: Masami Hiramatsu
Date: Fri Mar 06 2015 - 06:37:43 EST


(2015/03/06 19:51), Petr Mladek wrote:
> On Fri 2015-03-06 10:24:27, Masami Hiramatsu wrote:
>> (2015/03/05 23:18), Josh Poimboeuf wrote:
>>> On Thu, Mar 05, 2015 at 09:52:41AM +0900, Masami Hiramatsu wrote:
>>>> (2015/03/04 22:17), Petr Mladek wrote:
>>>>> On Tue 2015-03-03 17:02:22, Josh Poimboeuf wrote:
>>>>>> It's possible for klp_register_patch() to see a module before the COMING
>>>>>> notifier is called, or after the GOING notifier is called.
>>>>>>
>>>>>> That can cause all kinds of ugly races. As Pter Mladek reported:
>>>>>>
>>>>>> "The problem is that we do not keep the klp_mutex lock all the time when
>>>>>> the module is being added or removed.
>>>>>>
>>>>>> First, the module is visible even before ftrace is ready. If we enable a patch
>>>>>> in this time frame, adding ftrace ops will fail and the patch will get rejected
>>>>>> just because bad timing.
>>>>>
>>>>> Ah, this is not true after all. I did not properly check when
>>>>> MODULE_STATE_COMING was set. I though that it was before ftrace was
>>>>> initialized but it was not true.
>>>>>
>>>>>
>>>>>> Second, if we are "lucky" and enable the patch for the coming module when the
>>>>>> ftrace is ready but before the module notifier has been called. The notifier
>>>>>> will try to enable the patch as well. It will detect that it is already patched,
>>>>>> return error, and the module will get rejected just because bad
>>>>>> timing. The more serious problem is that it will not call the notifier for
>>>>>> going module, so that the mess will stay there and we wont be able to load
>>>>>> the module later.
>>>>>
>>>>> Ah, the race is there but the effect is not that serious in the
>>>>> end. It seems that errors from module notifiers are ignored. In fact,
>>>>> we do not propagate the error from klp_module_notify_coming(). It means
>>>>> that WARN() from klp_enable_object() will be printed but the module
>>>>> will be loaded and patched.
>>>>>
>>>>> I am sorry, I was confused by kGraft where kgr_module_init() was
>>>>> called directly from module_load(). The errors were propagated. It
>>>>> means that kGraft rejects module when the patch cannot be applied.
>>>>>
>>>>> Note that the current solution is perfectly fine for the simple
>>>>> consistency model.
>>>>>
>>>>>
>>>>>> Third, similar problems are there for going module. If a patch is enabled after
>>>>>> the notifier finishes but before the module is removed from the list of modules,
>>>>>> the new patch will be applied to the module. The module might disappear at
>>>>>> anytime when the patch enabling is in progress, so there might be an access out
>>>>>> of memory. Or the whole patch might be applied and some mess will be left,
>>>>>> so it will not be possible to load/patch the module again."
>>>>>
>>>>> This is true.
>>>>
>>>> No, that's not true if you try_get_module() before patching. After the
>>>> module state goes GOING (more correctly say, after try_release_module_ref()
>>>> succeeded), all try_get_module() must fail :)
>>>> So, please make sure to get module when applying patches.
>>>
>>> Hi Masami,
>>>
>>> As Jikos pointed out elsewhere, try_get_module() won't solve all the
>>> GOING races.
>>>
>>> The module can be in GOING before mod->exit() is called. If we apply a
>>> patch between GOING getting set and mod->exit(), try_module_get() will
>>> fail and the module won't be patched. But module code can still run
>>> before or during mod->exit(), so the unpatched module code might
>>> interact badly with new patched code elsewhere.
>>
>> Hmm, in that case, we'd better have new GONE state for the module.
>> At least kprobe needs it.
>
> What is the exact problem with kprobes, please?

Ah, sorry, I miss understood the issue.


> Note that the notifiers for MODULE_STATE_GOING are called
> after mod->exit(). Therefore it is safe to kill kprobes
> the fast way when using the notifier.

Right.

/* Stop the machine so refcounts can't move and disable module. */
ret = try_stop_module(mod, flags, &forced);
if (ret != 0)
goto out;

mutex_unlock(&module_mutex);
/* Final destruction now no one is using it. */
if (mod->exit != NULL)
mod->exit();
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
async_synchronize_full();

And, kprobes also can try to put a probe between mutex_unlock()
and mod->exit() on mod->exit() function.
Since kprobe tries to get module when registering, it will
fail but mod->exit() is called after that fail.

So, there is also a race.

However, I'm not sure that is actual serious issue.
Actually, we can suppose this module unloading context is
not changing universe. thus it is expected behavior, isn't it?

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/