Re: [PATCH v3 1/2] x86/cpu/intel: Fix MTRR verification for TME enabled platforms

From: Compostella, Jeremy
Date: Mon Oct 16 2023 - 13:03:05 EST


#+begin_signature
--
Jeremy
One Emacs to rule them all
#+end_signature<kirill.shutemov@xxxxxxxxxxxxxxx> writes:

> On Mon, Oct 16, 2023 at 09:14:35AM -0700, Compostella, Jeremy wrote:
>> <kirill.shutemov@xxxxxxxxxxxxxxx> writes:
>>
>> > On Fri, Oct 13, 2023 at 04:03:02PM -0700, Compostella, Jeremy wrote:
>> >> "kirill.shutemov@xxxxxxxxxxxxxxx" <kirill.shutemov@xxxxxxxxxxxxxxx> writes:
>> >> > On Tue, Oct 03, 2023 at 02:06:52AM +0000, Huang, Kai wrote:
>> >> >> On Tue, 2023-10-03 at 01:47 +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote:
>> >> >> > On Fri, Sep 29, 2023 at 09:14:00AM +0000, Huang, Kai wrote:
>> >> >> > > On Thu, 2023-09-28 at 15:30 -0700, Compostella, Jeremy wrote:
>> >> >> > > > On TME enabled platform, BIOS publishes MTRR taking into account Total
>> >> >> > > > Memory Encryption (TME) reserved bits.
>> >> >> > > >
>> >> >> > > > generic_get_mtrr() performs a sanity check of the MTRRs relying on the
>> >> >> > > > `phys_hi_rsvd' variable which is set using the cpuinfo_x86 structure
>> >> >> > > > `x86_phys_bits' field. But at the time the generic_get_mtrr()
>> >> >> > > > function is ran the `x86_phys_bits' has not been updated by
>> >> >> > > > detect_tme() when TME is enabled.
>> >> >> > > >
>> >> >> > > > Since the x86_phys_bits does not reflect yet the real maximal physical
>> >> >> > > > address size yet generic_get_mtrr() complains by logging the following
>> >> >> > > > messages.
>> >> >> > > >
>> >> >> > > > mtrr: your BIOS has configured an incorrect mask, fixing it.
>> >> >> > > > mtrr: your BIOS has configured an incorrect mask, fixing it.
>> >> >> > > > [...]
>> >> >> > > >
>> >> >> > > > In such a situation, generic_get_mtrr() returns an incorrect size but
>> >> >> > > > no side effect were observed during our testing.
>> >> >> > > >
>> >> >> > > > For `x86_phys_bits' to be updated before generic_get_mtrr() runs,
>> >> >> > > > move the detect_tme() call from init_intel() to early_init_intel().
>> >> >> > >
>> >> >> > > Hi,
>> >> >> > >
>> >> >> > > This move looks good to me, but +Kirill who is the author of detect_tme() for
>> >> >> > > further comments.
>> >> >> > >
>> >> >> > > Also I am not sure whether it's worth to consider to move this to
>> >> >> > > get_cpu_address_sizes(), which calculates the
>> >> >> > > virtual/physical address sizes.
>> >> >> > > Thus it seems anything that can impact physical address size
>> >> >> > > could be put there.
>> >> >> >
>> >> >> > Actually, I am not sure how this patch works. AFAICS after the patch we
>> >> >> > have the following callchain:
>> >> >> >
>> >> >> > early_identify_cpu()
>> >> >> > this_cpu->c_early_init() (which is early_init_init())
>> >> >> > detect_tme()
>> >> >> > c->x86_phys_bits -= keyid_bits;
>> >> >> > get_cpu_address_sizes(c);
>> >> >> > c->x86_phys_bits = eax & 0xff;
>> >> >> >
>> >> >> > Looks like get_cpu_address_sizes() would override what detect_tme() does.
>> >> >>
>> >> >> After this patch, early_identify_cpu() calls get_cpu_address_sizes() first and
>> >> >> then calls c_early_init(), which calls detect_tme().
>> >> >>
>> >> >> So looks no override. No?
>> >>
>> >> No override indeed as get_cpu_address_sizes() is always called before
>> >> early_init_intel or init_intel().
>> >>
>> >> - init/main.c::start_kernel()
>> >> - arch/x86/kernel/setup.c::setup_arch()
>> >> - arch/x86/kernel/cpu/common.c::early_cpu_init()
>> >> - early_identify_cpu()
>> >> - get_cpu_address_sizes(c)
>> >> c->x86_phys_bits = eax & 0xff;
>> >> - arch/x86/kernel/cpu/intel.c::early_init_intel()
>> >> - detect_tme()
>> >> c->x86_phys_bits -= keyid_bits;
>> >
>> > Hmm.. Do I read it wrong:
>> >
>> > static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>> > {
>> > ...
>> > /* cyrix could have cpuid enabled via c_identify()*/
>> > if (have_cpuid_p()) {
>> > ...
>> > // Here we call early_intel_init()
>> > if (this_cpu->c_early_init)
>> > this_cpu->c_early_init(c);
>> > ...
>> > }
>> >
>> > get_cpu_address_sizes(c);
>> > ...
>> > }
>> >
>> > ?
>> >
>> > As far as I see get_cpu_address_sizes() called after early_intel_init().
>>
>> On `58720809f527 v6.6-rc6 6.6-rc6 2de3c93ef41b' is what I have:
>>
>> ,----
>> | 1599 /* cyrix could have cpuid enabled via c_identify()*/
>> | 1600 if (have_cpuid_p()) {
>> | 1601 cpu_detect(c);
>> | 1602 get_cpu_vendor(c);
>> | 1603 get_cpu_cap(c);
>> | 1604 get_cpu_address_sizes(c); <= called first
>> | 1605 setup_force_cpu_cap(X86_FEATURE_CPUID);
>> | 1606 cpu_parse_early_param();
>> | 1607
>> | 1608 if (this_cpu->c_early_init)
>> | 1609 this_cpu->c_early_init(c);
>> | 1610
>> | 1611 c->cpu_index = 0;
>> | 1612 filter_cpuid_features(c, false);
>> | 1613
>> | 1614 if (this_cpu->c_bsp_init)
>> | 1615 this_cpu->c_bsp_init(c);
>> | 1616 } else {
>> | 1617 setup_clear_cpu_cap(X86_FEATURE_CPUID);
>> | 1618 }
>> `----
>> Listing 1: arch/x86/kernel/cpu/common.c
>>
>> => get_cpu_address_sizes() is called first which is also conform to my
>> experiments and instrumentation.
>
> Ah. It got patched in tip tree. See commit fbf6449f84bf.

This commit breaks AMD code as early_init_amd() calls
early_detect_mem_encrypt() to adjust x86_phys_bits which is not
initialized properly and then overwritten after.