Re: [PATCH v3] kexec: Support purgatories with .text.hot sections

From: Ricardo Ribalda
Date: Mon Mar 27 2023 - 07:52:28 EST


Hi Philipp



On Fri, 24 Mar 2023 at 17:00, Philipp Rudo <prudo@xxxxxxxxxx> wrote:
>
> Hi Ricardo,
>
> On Wed, 22 Mar 2023 20:09:21 +0100
> Ricardo Ribalda <ribalda@xxxxxxxxxxxx> wrote:
>
> > Clang16 links the purgatory text in two sections:
> >
> > [ 1] .text PROGBITS 0000000000000000 00000040
> > 00000000000011a1 0000000000000000 AX 0 0 16
> > [ 2] .rela.text RELA 0000000000000000 00003498
> > 0000000000000648 0000000000000018 I 24 1 8
> > ...
> > [17] .text.hot. PROGBITS 0000000000000000 00003220
> > 000000000000020b 0000000000000000 AX 0 0 1
> > [18] .rela.text.hot. RELA 0000000000000000 00004428
> > 0000000000000078 0000000000000018 I 24 17 8
> >
> > And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> > area pointed by `e_entry`.
> >
> > This causes that image->start is calculated twice, once for .text and
> > another time for .text.hot. The second calculation leaves image->start
> > in a random location.
> >
> > Because of this, the system crashes inmediatly after:
> >
> > kexec_core: Starting new kernel
>
> Great analysis!
>
> > Signed-off-by: Ricardo Ribalda <ribalda@xxxxxxxxxxxx>
> > ---
> > kexec: Fix kexec_file_load for llvm16
> >
> > When upreving llvm I realised that kexec stopped working on my test
> > platform. This patch fixes it.
> >
> > To: Eric Biederman <ebiederm@xxxxxxxxxxxx>
> > Cc: Baoquan He <bhe@xxxxxxxxxx>
> > Cc: Philipp Rudo <prudo@xxxxxxxxxx>
> > Cc: kexec@xxxxxxxxxxxxxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > ---
> > Changes in v3:
> > - Fix initial value. Thanks Ross!
> > - Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@xxxxxxxxxxxx
> >
> > Changes in v2:
> > - Fix if condition. Thanks Steven!.
> > - Update Philipp email. Thanks Baoquan.
> > - Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@xxxxxxxxxxxx
> > ---
> > kernel/kexec_file.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> > index f1a0e4e3fb5c..25a37d8f113a 100644
> > --- a/kernel/kexec_file.c
> > +++ b/kernel/kexec_file.c
> > @@ -901,10 +901,21 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
> > }
> >
> > offset = ALIGN(offset, align);
> > +
> > + /*
> > + * Check if the segment contains the entry point, if so,
> > + * calculate the value of image->start based on it.
> > + * If the compiler has produced more than one .text sections
> > + * (Eg: .text.hot), they are generally after the main .text
> > + * section, and they shall not be used to calculate
> > + * image->start. So do not re-calculate image->start if it
> > + * is not set to the initial value.
> > + */
> > if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
> > pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
> > pi->ehdr->e_entry < (sechdrs[i].sh_addr
> > - + sechdrs[i].sh_size)) {
> > + + sechdrs[i].sh_size) &&
> > + kbuf->image->start == pi->ehdr->e_entry) {
>
> I'm not entirely sure if this is the solution to go with. As you state
> in the comment above this solution assumes that the .text section comes
> before any other .text.* section. But this assumption isn't much
> stronger than the assumption that there is only a single .text section,
> which is used nowadays.
>
> The best solution I can come up with right now is to introduce a linker
> script for the purgatory that simply merges the .text sections into
> one. Similar to what I did for s390 in
> arch/s390/purgatory/purgatory.lds.S (although for a different reason).
> But that would require every architecture to get one. An alternative
> would be to find a way to get rid of the -r option on the LD_FLAGS,
> which IIRC is the reason why both section overlap in the first place.


I tried removing the -r from arch/x86/purgatory/Makefile and that resulted into:

[ 115.631578] BUG: unable to handle page fault for address: ffff93224d5c8e20
[ 115.631583] #PF: supervisor write access in kernel mode
[ 115.631585] #PF: error_code(0x0002) - not-present page
[ 115.631586] PGD 100000067 P4D 100000067 PUD 1001ed067 PMD 132b58067 PTE 0
[ 115.631589] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 115.631592] CPU: 0 PID: 5291 Comm: kexec-lite Tainted: G U
5.15.103-17399-g852a928df601-dirty #19
cd159e0d6a91f03e06035a0a8eb7fc984a8f3e82
[ 115.631594] Hardware name: Google Crota/Crota, BIOS
Google_Crota.14505.288.0 11/08/2022
[ 115.631595] RIP: 0010:memcpy_erms+0x6/0x10
[ 115.631599] Code: 5d 00 eb bd eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 cc cc cc cc 66 90 48 89 f8
48 89 d1 <f3> a4 c3 cc cc cc cc 0f 1f 00 48 89 f8 48 83 fa 20 72 7e 40
38 fe
[ 115.631601] RSP: 0018:ffff93224f65fe50 EFLAGS: 00010246
[ 115.631602] RAX: ffff93224d5c8e20 RBX: 00000000ffffffea RCX: 0000000000000100
[ 115.631603] RDX: 0000000000000100 RSI: ffff9322407bd000 RDI: ffff93224d5c8e20
[ 115.631604] RBP: ffff93224f65fe88 R08: 0000000000000000 R09: ffff92133cd3ef08
[ 115.631605] R10: ffff9322407be000 R11: ffffffffa1b4f2e0 R12: 0000000000000000
[ 115.631606] R13: ffff92133cee4c00 R14: 0000000000000100 R15: ffffffffa2b6f14f
[ 115.631607] FS: 000078e8b9dbf7c0(0000) GS:ffff921437800000(0000)
knlGS:0000000000000000
[ 115.631609] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 115.631610] CR2: ffff93224d5c8e20 CR3: 000000015be26001 CR4: 0000000000770ef0
[ 115.631611] PKRU: 55555554
[ 115.631612] Call Trace:
[ 115.631614] <TASK>
[ 115.631615] kexec_purgatory_get_set_symbol+0x82/0xd3
[ 115.631619] __se_sys_kexec_file_load+0x523/0x644
[ 115.631621] do_syscall_64+0x58/0xa5
[ 115.631623] entry_SYSCALL_64_after_hwframe+0x61/0xcb


And I did not continue in that direction.

I also tried finding a flag for llvm that would avoid splitting .text,
but was not lucky either.

I will look into making a linker script for x86, we could combine it
with something like:

if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
pi->ehdr->e_entry < (sechdrs[i].sh_addr
- + sechdrs[i].sh_size) &&
- kbuf->image->start == pi->ehdr->e_entry) {
- kbuf->image->start -= sechdrs[i].sh_addr;
- kbuf->image->start += kbuf->mem + offset;
+ + sechdrs[i].sh_size)) {
+ if (!WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) {
+ kbuf->image->start -= sechdrs[i].sh_addr;
+ kbuf->image->start += kbuf->mem + offset;
+ }
}

So developers have some hints of what to look at.

Thanks!


>
> Thanks
> Philipp
>
> > kbuf->image->start -= sechdrs[i].sh_addr;
> > kbuf->image->start += kbuf->mem + offset;
> > }
> >
> > ---
> > base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> > change-id: 20230321-kexec_clang16-4510c23d129c
> >
> > Best regards,
>


--
Ricardo Ribalda