Re: [PATCH v9] livepatch: Clear relocation targets on a module removal

From: Joe Lawrence
Date: Tue Jan 24 2023 - 15:09:21 EST


On 1/20/23 01:42, Josh Poimboeuf wrote:
> On Thu, Jan 19, 2023 at 11:06:35AM -0800, Song Liu wrote:
>> On Wed, Jan 18, 2023 at 2:08 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>
>>> On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
>>>> From: Miroslav Benes <mbenes@xxxxxxx>
>>>>
>>>> Josh reported a bug:
>>>>
>>>> When the object to be patched is a module, and that module is
>>>> rmmod'ed and reloaded, it fails to load with:
>>>>
>>>> module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
>>>> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>>>> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>>>>
>>>> The livepatch module has a relocation which references a symbol
>>>> in the _previous_ loading of nfsd. When apply_relocate_add()
>>>> tries to replace the old relocation with a new one, it sees that
>>>> the previous one is nonzero and it errors out.
>>>>
>>>> On ppc64le, we have a similar issue:
>>>>
>>>> module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
>>>> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>>>> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>>>
>>> Shouldn't there also be a fix for this powerpc issue?
>>
>> There was a working version, but it was not very clean. We couldn't agree
>> on the path forward for powerpc, so we are hoping to ship the fix to x86 (and
>> s390?) first [1].
>
> Sorry for coming in late, I was on leave so I missed a lot of the
> discussions on previous versions. The decision to leave powerpc broken
> wasn't clear from reading the commit message. The bug is mentioned, and
> the fix is implied, but surprisingly there's no fix.
>
> I agree that the powerpc fix should be in a separate patch, but I still
> don't feel comfortable merging the x86 fix without the corresponding
> powerpc fix.
>
> powerpc is a major arch and not a second-class citizen. If we don't fix
> it now then it'll probably never get fixed until it blows up in the real
> world.
>
> For powerpc, instead of clearing, how about just "fixing" the warning
> site, something like so (untested)?
>
>
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 1096d6b3a62c..1a12463ba674 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -499,9 +499,11 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
>
> /* We expect a noop next: if it is, replace it with instruction to
> restore r2. */
> -static int restore_r2(const char *name, u32 *instruction, struct module *me)
> +static int restore_r2(const char *name, u32 *instruction, struct module *me,
> + bool klp_sym)
> {
> u32 *prev_insn = instruction - 1;
> + u32 insn_val = *instruction;
>
> if (is_mprofile_ftrace_call(name))
> return 1;
> @@ -514,9 +516,18 @@ static int restore_r2(const char *name, u32 *instruction, struct module *me)
> if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
> return 1;
>
> - if (*instruction != PPC_RAW_NOP()) {
> + /*
> + * For a livepatch relocation, the restore r2 instruction might have
> + * been previously written if the relocation references a symbol in a
> + * module which was unloaded and is now being reloaded. In that case,
> + * skip the warning and instruction write.
> + */
> + if (klp_sym && insn_val == PPC_INST_LD_TOC)
> + return 0;

Hi Josh,

Nit: shouldn't this return 1?

And if you're willing to entertain a small refactor, wouldn't
restore_r2() be clearer if it returned -ESOMETHING on error?

Maybe converting to a boolean could work, but then I'd suggest a name
that clearly implies success/fail given true/false return. Maybe
replace_nop_with_ld_toc() or replace_nop_to_restore_r2() ... still
-ESOMETHING is more intuitive to me as there are cases like this where
the function safely returns w/o replacing anything.

> +
> + if (insn_val != PPC_RAW_NOP()) {
> pr_err("%s: Expected nop after call, got %08x at %pS\n",
> - me->name, *instruction, instruction);
> + me->name, insn_val, instruction);
> return 0;
> }
>
> @@ -649,7 +660,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> if (!value)
> return -ENOENT;
> if (!restore_r2(strtab + sym->st_name,
> - (u32 *)location + 1, me))
> + (u32 *)location + 1, me,
> + sym->st_shndx == SHN_LIVEPATCH))
> return -ENOEXEC;
> } else
> value += local_entry_offset(sym);
>

klp-convert-tree tests* ran OK with this patch (with the nit fixed) on
top of Song's v10. LMK if you want me to push a branch with some or all
of these patches for further testing.

* I removed the tests that check for relocation clearing, only tested
module reloading

--
Joe