Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()

From: Miroslav Benes
Date: Tue Dec 15 2015 - 03:14:59 EST



Hi,

[ sry for late responses. Two weeks of holiday and trying to go
through all the emails... ]

On Mon, 30 Nov 2015, Li Bin wrote:

> There is a potential race as following:
>
> CPU0 | CPU1
> -----------------------------|-----------------------------------
> enabled_store() | klp_unregister_patch()
> | |-mutex_lock(&klp_mutex);
> |-mutex_lock(&klp_mutex); | |-klp_free_patch();
> | |-mutex_unlock(&klp_mutex);
> |-[process the patch's state]|
> |-mutex_unlock(&klp_mutex) |
>
> Fix this race condition by adding klp_is_patch_registered() check in
> enabled_store() after get the lock klp_mutex.
>
> Signed-off-by: Li Bin <huawei.libin@xxxxxxxxxx>

Well, I think all the relevant feedback was mentioned in other replies.
I'll add my two cents here than to reply to the respective ones.

1. as Josh pointed out this is a potential deadlock situation between
sysfs infrastructure and our unregister path. It has been already
discussed [1] and some solutions were proposed [2].

2. if I am not missing something it is purely theoretical now.
klp_unregister_patch is supposed to be called from module_exit function
to clean up after the patch module. There is no way today how this
function can be called. We take module reference (try_module_get() in
klp_register_patch()) and we do not put it anywhere. Because we can't
as Petr mentioned. Without a consistency model we cannot know if there
is a task in a module code or not.

So I think we can postpone the solution till there is a consistency model.

Regards,
Miroslav

[1] https://lkml.kernel.org/r/alpine.LNX.2.00.1502171728520.29490@xxxxxxxxxxxxx

[2]
1. rework klp_unregister_patch and move kobject_put out of mutex
protected parts. This is what Jiri Slaby also proposed.

2. as Petr Mladek said we could use mutex_trylock instead of mutex_lock
in enabled_store.

I've even had the solution prepared in one of my git branches since
April so here it is just for the sake of completeness. It is based on
Jiri Slaby's kgraft consistency model for klp and it is maybe a
superset, but you get the idea.

-->8---