Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

From: Arvind Sankar
Date: Thu Dec 03 2020 - 11:14:52 EST


On Thu, Dec 03, 2020 at 09:48:57AM +0100, Borislav Petkov wrote:
> On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote:
> > The pfn_range_is_mapped() call just checks whether it is mapped at all
> > in the direct mapping. Is the TSEG range supposed to be marked as
> > non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
> > created for non-RAM is for the 0-1Mb real-mode range, and that will
> > always use 4k pages. Above that anything not marked as RAM will create
> > an unmapped hole in the direct map, so in this case the memory just
> > below the TSEG base would already use smaller pages if needed.
> >
> > If it's possible that the E820 mapping says this range is RAM, then
> > should we also break up the direct map just after the end of the TSEG
> > range for the same reason?
>
> So I have a machine where TSEG is not 2M aligned and somewhere in the 1G
> range:
>
> [ 1.135094] tseg: 003bf00000
>
> It is not in the E820 map either:
>
> [ 0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [ 0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff]
> [ 0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff]
> [ 0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff]
> [ 0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff]
>
> That doesn't mean that it can happen that there might be some
> configuration where it ends up being mapped.
>
> So looking at what the code does, it kinda makes sense: you want the 2M
> range between 0x3be00000 and 0x3c000000 to be split into 4K mappings,
> *if* it is mapped.
>
> I need to find a box where it is mapped *and* not 2M aligned, though,
> for testing. Which appears kinda hard to do as all the new ones are
> aligned.

Do any of them have it mapped at all, regardless of the alignment? There
seems to be nothing else in the kernel that ever looks at the TSEG MSR,
so I would guess that it has to be non-RAM in the E820 map, otherwise
nothing would prevent the kernel from allocating and using that space.

I found the actual original commit, which does has a description of the
reasoning. It's
8346ea17aa20 ("x86: split large page mapping for AMD TSEG")

It looks like at the time, the direct mapping didn't really look at the
E820 map in any detail, and was always set up with at least 2Mb pages,
or Gb pages if they were available, from 0 to max_pfn_mapped. So the
direct mapping would have covered even holes that weren't in the E820
map.

Commit
66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")
changed the direct map setup to avoid mapping holes, because it
apparently became more serious than performance issues: this commit
mentions MCE's getting triggered because of the overmapping.

>
> The above is from a K8 box which should already be dead, as a matter of
> fact.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette