Re: [PATCH] x86/mm/ident_map: Use full gbpages in identity maps except on UV platform.

From: Steve Wahl
Date: Fri Mar 22 2024 - 14:06:41 EST


On Fri, Mar 22, 2024 at 10:40:37AM -0700, Dave Hansen wrote:
> On 3/22/24 10:31, Eric W. Biederman wrote:
> >> I'd much rather add synthetic entries to the memory maps that have this
> >> information than hack around it by assuming that things are within a
> >> gigabyte.
> > So this change is a partial revert of a change that broke kexec in
> > existing configurations. To fix a regression that breaks kexec.

Hi, Dave!

> Let's back up for a second:
>
> * Mapping extra memory on UV systems causes halts[1]
> * Mapping extra memory on UV systems breaks kexec (this thread)

These are the same. The most reliable way to create the problem[1] on
UV is a kexec to a kdump kernel, because of the typical placement of
the kdump kernel active region with respect to the reserved addresses
that cause the halts. (The distros we typically run place the
crashkernel just below the highest reserved region, where a gbpage can
include both.)

What you didn't state here is the third bullet that this patch addresses.

* Neglecting to map extra memory on some (firmware buggy?) non-UV
systems breaks kexec.

> So we're in a pickle. I understand your concern for kexec. But I'm
> concerned that fixing the kexec problem will re-expose us to the [1]
> problem.

> Steve, can you explain a bit why this patch doesn't re-expose the kernel
> to the [1] bug?
>
> 1. https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl@xxxxxxx/

This patch still has UV systems avoid gbpages that go far outside
actual requested regions, but allows the full gb pages on other
systems. On UV systems, the new gbpage algorithm is followed. On
non-UV systems, gbpages are allowed even for requests that don't cover
a complete gbpage -- essentially the former algorithm but using the
new code.

Hope that makes sense.

I would probably consider this buggy firmware, but got enough reports
of this regression (from Pavin Joseph, Eric Hagberg, and Sara
Brofeldt, all of whom tested the patch to see if it cured the
regression) that it seemd everyone would want it fixed quickly and
point fingers later.

In the private debugging exchanges with Pavin, I got some printks of
regions that were mapped, and did one exchange with hard-coded adding
regions not covered on his particular system back into the table;
there were four regions left out. I added all four in one patch. I
could have dived in further to diagnose which of the missing region(s)
were actually necessary to get kexec to succeed, but couldn't see what
I would do with that information once I had it, as I don't see a way
to generalize this to other platforms exhibiting the problem.

Thanks,

--> Steve

--
Steve Wahl, Hewlett Packard Enterprise