RE: [PATCH v6 2/5] x86/kexec: do unconditional WBINVD for bare-metal in relocate_kernel()

From: Kaplan, David
Date: Wed Sep 11 2024 - 22:21:34 EST


[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Huang, Kai <kai.huang@xxxxxxxxx>
> Sent: Tuesday, September 10, 2024 4:53 AM
> To: Hansen, Dave <dave.hansen@xxxxxxxxx>; bp@xxxxxxxxx;
> peterz@xxxxxxxxxxxxx; hpa@xxxxxxxxx; mingo@xxxxxxxxxx;
> tglx@xxxxxxxxxxxxx; Kaplan, David <David.Kaplan@xxxxxxx>;
> kirill.shutemov@xxxxxxxxxxxxxxx
> Cc: Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx>; seanjc@xxxxxxxxxx;
> x86@xxxxxxxxxx; dyoung@xxxxxxxxxx; sagis@xxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Williams, Dan J <dan.j.williams@xxxxxxxxx>;
> Lendacky, Thomas <Thomas.Lendacky@xxxxxxx>; pbonzini@xxxxxxxxxx;
> Kalra, Ashish <Ashish.Kalra@xxxxxxx>; Yamahata, Isaku
> <isaku.yamahata@xxxxxxxxx>; bhe@xxxxxxxxxx; nik.borisov@xxxxxxxx
> Subject: Re: [PATCH v6 2/5] x86/kexec: do unconditional WBINVD for bare-
> metal in relocate_kernel()
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Tue, 2024-09-10 at 02:46 +0000, Kaplan, David wrote:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > > -----Original Message-----
> > > From: Huang, Kai <kai.huang@xxxxxxxxx>
> > > Sent: Monday, September 9, 2024 9:42 PM
> > > To: Kaplan, David <David.Kaplan@xxxxxxx>; Hansen, Dave
> > > <dave.hansen@xxxxxxxxx>; bp@xxxxxxxxx; tglx@xxxxxxxxxxxxx;
> > > peterz@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; hpa@xxxxxxxxx;
> > > kirill.shutemov@xxxxxxxxxxxxxxx
> > > Cc: x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > pbonzini@xxxxxxxxxx; seanjc@xxxxxxxxxx; Williams, Dan J
> > > <dan.j.williams@xxxxxxxxx>; Lendacky, Thomas
> > > <Thomas.Lendacky@xxxxxxx>; Edgecombe, Rick P
> > > <rick.p.edgecombe@xxxxxxxxx>; Yamahata, Isaku
> > > <isaku.yamahata@xxxxxxxxx>; Kalra, Ashish <Ashish.Kalra@xxxxxxx>;
> > > bhe@xxxxxxxxxx; nik.borisov@xxxxxxxx; sagis@xxxxxxxxxx; Dave
> Young
> > > <dyoung@xxxxxxxxxx>
> > > Subject: Re: [PATCH v6 2/5] x86/kexec: do unconditional WBINVD for
> > > bare- metal in relocate_kernel()
> > >
> > > Caution: This message originated from an External Source. Use proper
> > > caution when opening attachments, clicking links, or responding.
> > >
> > >
> > > > > --- a/arch/x86/kernel/machine_kexec_64.c
> > > > > +++ b/arch/x86/kernel/machine_kexec_64.c
> > > > > @@ -322,16 +322,9 @@ void machine_kexec_cleanup(struct kimage
> > > *image)
> > > > > void machine_kexec(struct kimage *image) {
> > > > > unsigned long page_list[PAGES_NR];
> > > > > - unsigned int host_mem_enc_active;
> > > > > int save_ftrace_enabled;
> > > > > void *control_page;
> > > > >
> > > > > - /*
> > > > > - * This must be done before load_segments() since if call depth
> > > tracking
> > > > > - * is used then GS must be valid to make any function calls.
> > > > > - */
> > > > > - host_mem_enc_active =
> > > > > cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT);
> > > > > -
> > > >
> > > > Functionally the patch looks fine. I would suggest keeping some
> > > > form of
> > > this comment though, because the limitation about not being able to
> > > make function calls after load_segments() is arguably non-obvious
> > > and this comment served as a warning for future modifications in this
> area.
> > >
> > > Yeah this makes sense. Thanks.
> > >
> > > I think we can add some text to the existing comment of
> > > load_segments() to call out this. Allow me to dig into more about
> > > call depth tracking to understand it better -- relocate_kernel()
> > > after load_segments() seems to be a real function call and I want to
> > > know how does it interact with call depth tracking.
> >
> > That one is explicitly ignored, see skip_addr() in
> > arch/x86/kernel/callthunks.c
> >
>
> That was I thought too. Thanks for pointing out.
>
> How about below?
>
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -351,6 +351,11 @@ void machine_kexec(struct kimage *image)
> *
> * I take advantage of this here by force loading the
> * segments, before I zap the gdt with an invalid value.
> + *
> + * Note this resets GS to 0. Don't make any function call after
> + * here since call depth tracking uses per-cpu variables to
> + * operate (relocate_kernel is explicitly ignored by call depth
> + * tracking).
> */

Looks good, thanks!

>
> Btw, it would be very helpful if you can help to verify this patch doesn't break
> call depth tracking in your environment. Thanks!

Tested it and it seemed fine (kexec worked and disassembly did not show any calls after GS is cleared, as expected).

--David Kaplan