Re: Bug with paravirt ops and livepatches

From: Josh Poimboeuf
Date: Mon Apr 04 2016 - 12:14:36 EST


On Fri, Apr 01, 2016 at 09:35:34PM +0200, Jiri Kosina wrote:
> On Fri, 1 Apr 2016, Chris J Arges wrote:
>
> > Loading, please wait...
> > starting version 229
> > [ 1.182869] random: udevadm urandom read with 2 bits of entropy available
> > [ 1.241404] BUG: unable to handle kernel paging request at ffffffffc000f35f
>
> Gah, we surely can't change pages with RO PTE. Thanks for such a prompt
> testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?
>
> The patch below should fix that by marking the module RO (and relevant
> parts NX) only when it's guaranteed that .text is not going to be modified
> any more (and includes the error handling fix Miroslav spotted as well).
>
> Thanks.
>
>
>
> diff --git a/kernel/module.c b/kernel/module.c
> index 5f71aa6..430606d 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
> return 0;
> }
>
> -static int post_relocation(struct module *mod, const struct load_info *info)
> +static void post_relocation(struct module *mod, const struct load_info *info)
> {
> /* Sort exception table now relocations are done. */
> sort_extable(mod->extable, mod->extable + mod->num_exentries);
> @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const struct load_info *info)
>
> /* Setup kallsyms-specific fields. */
> add_kallsyms(mod, info);
> -
> - /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs, mod);
> }
>
> /* Is this module of this name done loading? No locks held. */
> @@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, struct load_info *info)
> /* This relies on module_mutex for list integrity. */
> module_bug_finalize(info->hdr, info->sechdrs, mod);
>
> - /* Set RO and NX regions */
> - module_enable_ro(mod);
> - module_enable_nx(mod);
> -
> /* Mark state as coming so strong_try_module_get() ignores us,
> * but kallsyms etc. can see us. */
> mod->state = MODULE_STATE_COMING;
> @@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> if (err < 0)
> goto free_modinfo;
>
> - err = post_relocation(mod, info);
> - if (err < 0)
> - goto free_modinfo;
> + post_relocation(mod, info);
>
> flush_module_icache(mod);
>
> @@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const char __user *uargs,
> if (err)
> goto bug_cleanup;
>
> + /* Arch-specific module finalizing. */
> + err = module_finalize(info->hdr, info->sechdrs, mod);
> + if (err)
> + goto coming_cleanup;
> +
> + /* Set RO and NX regions */
> + module_enable_ro(mod);
> + module_enable_nx(mod);
> +
> /* Module is ready to execute: parsing args may do that. */
> after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> -32768, 32767, mod,

So I think this doesn't fix the problem. Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module". This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example. Imagine you have
a patch module P which applies a patch to module M. P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M. P's new function F' has an
instruction which is patched by apply_paravirt(), even though the
patch hasn't been applied yet.

2) Module M is loaded. Before applying the patch, livepatch tries to
apply a klp_reloc to the instruction in F' which was already patched
by apply_paravirt() in step 1. This results in undefined behavior
because it tries to patch the original instruction but instead
patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.

Jessica proposed some novel fixes here:

https://github.com/dynup/kpatch/issues/580#issuecomment-183001652

But I get the feeling that any fix would be quite ugly and brittle.

I think the *real* problem here (and one that we've seen before) is that
we have a feature which allows you to load a patch to a module before
loading the module itself. That really goes against the grain of how
module dependencies work. It has already given us several headaches and
it makes the livepatch code a lot more complex.

I really think we need to take another hard look about whether it's
really worth it. My current feeling is that it's not.

If we were able to get rid of that "feature", yes, the livepatch code
would be simpler, but there might be another awesome benefit: I suspect
we'd also be able to get rid of the need for specialized patch creation
tooling like kpatch-build. Instead I think we could just specify
klp_relocs info in the source code of the patch, and just use kbuild to
build the patch module. Not only would the livepatch code be simpler
(and much easier to wrap your head around), but the user space tooling
could be *vastly* simpler.

Of course, removing that feature might create some headaches for the
user. It is nice to be able to load a big cumulative patch without
having to load all the dependencies first. But maybe there are things
we could do to make the dependency problem more manageable. e.g.,
splitting up patch modules to be per-object? requiring the user to load
modules they don't need? patching or replacing the module on disk?
copying the new module to a new locaiton and telling modprobe where to
find it?

I don't have all the answers but I think we should take a hard look at
some of these other approaches.

--
Josh