Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

From: Joe Lawrence
Date: Tue Dec 13 2022 - 17:20:07 EST


On 12/13/22 8:29 AM, Petr Mladek wrote:
> On Tue 2022-12-13 00:13:46, Song Liu wrote:
>> )() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
>>>
>>> On Fri 2022-12-09 11:59:35, Song Liu wrote:
>>>> On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
>>>>> On Mon 2022-11-28 17:57:06, Song Liu wrote:
>>>>>> On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
>>>>>>>
>>>>>>>> --- a/arch/powerpc/kernel/module_64.c
>>>>>>>> +++ b/arch/powerpc/kernel/module_64.c
>>>>>>>> +#ifdef CONFIG_LIVEPATCH
>>>>>>>> +void clear_relocate_add(Elf64_Shdr *sechdrs,
>>>>>>>> + const char *strtab,
>>>>>>>> + unsigned int symindex,
>>>>>>>> + unsigned int relsec,
>>>>>>>> + struct module *me)
>>>>>>>> +{
>>>
>>> [...]
>>>
>>>>>>>> +
>>>>>>>> + instruction = (u32 *)location;
>>>>>>>> + if (is_mprofile_ftrace_call(symname))
>>>>>>>> + continue;
>>>>>
>>>>> Why do we ignore these symbols?
>>>>>
>>>>> I can't find any counter-part in apply_relocate_add(). It looks super
>>>>> tricky. It would deserve a comment.
>>>>>
>>>>> And I have no idea how we could maintain these exceptions.
>>>>>
>>>>>>>> + if (!instr_is_relative_link_branch(ppc_inst(*instruction)))
>>>>>>>> + continue;
>>>>>
>>>>> Same here. It looks super tricky and there is no explanation.
>>>>
>>>> The two checks are from restore_r2(). But I cannot really remember
>>>> why we needed them. It is probably an updated version from an earlier
>>>> version (3 year earlier..).
>>>
>>> This is a good sign that it has to be explained in a comment.
>>> Or even better, it should not by copy pasted.
>>>
>>>>>>>> + instruction += 1;
>>>>>>>> + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP()));
>>>
>>> I believe that this is not enough. apply_relocate_add() does this:
>>>
>>> int apply_relocate_add(Elf64_Shdr *sechdrs,
>>> [...]
>>> struct module *me)
>>> {
>>> [...]
>>> case R_PPC_REL24:
>>> /* FIXME: Handle weak symbols here --RR */
>>> if (sym->st_shndx == SHN_UNDEF ||
>>> sym->st_shndx == SHN_LIVEPATCH) {
>>> [...]
>>> if (!restore_r2(strtab + sym->st_name,
>>> (u32 *)location + 1, me))
>>> [...] return -ENOEXEC;
>>>
>>> ---> if (patch_instruction((u32 *)location, ppc_inst(value)))
>>> return -EFAULT;
>>>
>>> , where restore_r2() does:
>>>
>>> static int restore_r2(const char *name, u32 *instruction, struct module *me)
>>> {
>>> [...]
>>> /* ld r2,R2_STACK_OFFSET(r1) */
>>> ---> if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
>>> return 0;
>>> [...]
>>> }
>>>
>>> By other words, apply_relocate_add() modifies two instructions:
>>>
>>> + patch_instruction() called in restore_r2() writes into "location + 1"
>>> + patch_instruction() called in apply_relocate_add() writes into "location"
>>>
>>> IMHO, we have to clear both.
>>>
>>> IMHO, we need to implement a function that reverts the changes done
>>> in restore_r2(). Also we need to revert the changes done in
>>> apply_relocate_add().
>>
>> I finally got time to read all the details again and recalled what
>> happened with the code.
>>
>> The failure happens when we
>> 1) call apply_relocate_add() on klp load (or module first load,
>> if klp was loaded first);
>> 2) do nothing when the module is unloaded;
>> 3) call apply_relocate_add() on module reload, which failed.
>>
>> The failure happens at this check in restore_r2():
>>
>> if (*instruction != PPC_RAW_NOP()) {
>> pr_err("%s: Expected nop after call, got %08x at %pS\n",
>> me->name, *instruction, instruction);
>> return 0;
>> }
>>
>> Therefore, apply_relocate_add only fails when "location + 1"
>> is not NOP. And to make it not fail, we only need to write NOP to
>> "location + 1" in clear_relocate_add().
>
> Yes, this should be enough to pass the existing check.
>
>> IIUC, you want clear_relocate_add() to undo everything we did
>> in apply_relocate_add(); while I was writing clear_relocate_add()
>> to make the next apply_relocate_add() not fail.
>>
>> I agree that, based on the name, clear_relocate_add() should
>> undo everything by apply_relocate_add(). But I am not sure how
>> to handle some cases. For example, how do we undo
>>
>> case R_PPC64_ADDR32:
>> /* Simply set it */
>> *(u32 *)location = value;
>> break;
>>
>> Shall we just write zeros? I don't think this matters.
>
> I guess that it would be zeros as we do in x86_64.
>
>
>> I think this is the question we should answer first:
>> What shall clear_relocate_add() do?
>> 1) undo everything by apply_relocate_add();
>> 2) only do things needed to make the next
>> apply_relocate_add succeed;
>> 3) something between 1) and 2).
>
> Good question.
>
> Hmm, the commit a443bf6e8a7674b86221f49 ("powerpc/modules: Add REL24
> relocation support of livepatch symbols") suggests that all symbols
> in the section SHN_LIVEPATCH have the type R_PPC_REL24. AFAIK, the
> kernel livepatches are the only user of the clear_relocate_add()
> feature.
>
> If the above is correct then it might be enough to clear only
> R_PPC_REL24 type. And it might be enough to warn when clear_relocate_add()
> is called for another type so that we know when the relocations
> were not cleared properly.
>
> Good question. We might need some input from people familiar
> with the architecture and creating the livepatches.
>

Adding Russell to the to CC list as he worked some of recent ppc64le
livepatch klp-relocation threads [1] [2].

Maybe it would simpler to first organize a cleanup of the code, then add
the capability to undo the relocations? According to [2] and the last
comment on [3], it sounded like the Power folks had a "full"(er)
solution in mind depending on our requirements.

Finally, I'll try to finish my v6.1 rebase of the klp-convert patchset
this week. That includes a bunch of kselftests that generate all manner
of klp-relocation types and sections. (More than I've ever seen out of
kpatch-build.)

[1] https://lore.kernel.org/linuxppc-dev/YX9UUBeudSUuJs01@xxxxxxxxxx/
[2] https://lore.kernel.org/linuxppc-dev/YxAc87dTmclHGCUy@xxxxxxxxxx/
[3] https://github.com/linuxppc/issues/issues/375

--
Joe