Re: [PATCH v2 RESEND 2/2] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system

From: Mike Travis
Date: Thu May 17 2018 - 10:50:39 EST




On 5/17/2018 8:06 AM, Ramsay, Frank wrote:


-----Original Message-----
From: Baoquan He [mailto:bhe@xxxxxxxxxx]
Sent: Wednesday, May 16, 2018 11:18 PM
To: Travis, Mike <mike.travis@xxxxxxx>; Anderson, Russ
<russ.anderson@xxxxxxx>; Ramsay, Frank <frank.ramsay@xxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
x86@xxxxxxxxxx; mingo@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; hpa@xxxxxxxxx;
thgarnie@xxxxxxxxxx; keescook@xxxxxxxxxxxx; akpm@linux-
foundation.org; yamada.masahiro@xxxxxxxxxxxxx; Sivanich, Dimitri
<dimitri.sivanich@xxxxxxx>; dyoung@xxxxxxxxxx
Subject: Re: [PATCH v2 RESEND 2/2] x86/mm/KASLR: Do not adapt the size of
the direct mapping section for SGI UV system

Hi Mike, Russ and Frank,

On 09/28/17 at 07:10am, Mike Travis wrote:


On 9/28/2017 2:01 AM, Ingo Molnar wrote:

* Baoquan He <bhe@xxxxxxxxxx> wrote:

@@ -123,7 +124,7 @@ void __init
kernel_randomize_memory(void)

CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
/* Adapt phyiscal memory region size based on available
memory */
- if (memory_tb < kaslr_regions[0].size_tb)
+ if (memory_tb < kaslr_regions[0].size_tb &&
+!is_early_uv_system())
kaslr_regions[0].size_tb = memory_tb;
This is really an ugly hack. Is kaslr_regions[] incorrect? If so
then it should be corrected instead of uglifying the code that uses it...

Thanks for looking into this!

If on SGI UV system, the kaslr_regions[0].size_tb, namely the size
of the direct mapping section, is incorrect.

Its direct mapping size includes two parts:
#1 RAM size of system
#2 MMIOH region size which only SGI UV system has.

However, the #2 can only be got till uv_system_init() is called in
native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
That's why I made this hack.

I checked uv_system_init() code, seems not easy to know the size
of MMIOH region before or inside kernel_randomize_memory(). I have
CCed UV devel experts, not sure if they have any idea about this.
Otherwise, this patch could be the only way I can think of.

Hi Mike and Russ,

Is there any chance we can get the size of MMIOH region before mm
KASLR code, namely before we call kernel_randomize_memory()?

The sizes of the MMIOL and MMIOH areas are tied into the HUB design
and how it is communicated to BIOS and the kernel. This is via some
of the config MMR's found in the HUB and it would be impossible to
provide any access to these registers as they change with each new UV
architecture.

The kernel does reserve the memory in the EFI memmap. I can send you
a console log of the full startup that includes the MMIOH
reservations. Note that it is dependent on what I/O devices are
actually present as UV does not map empty slots unless forced (because
we'd quickly run out of resources.) Also, the EFI memmap entries do
not specify the exact usage of the contained areas.

This one is still a regression bug in our newer rhel since I just fixed them with
rhel-only patch. Now I still need the console log which includes the MMIOH
reservations.


Does the system need to have an external IO device for this? If not you should just be able to boot one of the SGI UV systems in the beaker lab (possibly also the HPE Superdome Flex that is in beaker; hpe-flex-01.rhts.eng.bos.redhat.com)

If you have a hawks2 (UV4), you would have 4 10G ethernet devices on the base I/O. But these would only have smaller MMIOH0 regions. This would not cause MMIOH1 regions to be allocated and assigned. (MC990X/UV3 only has a single sized MMIOH regions where they are all big enough for the largest MMIOH region found on any I/O device.)


Could you help provide a console log with MMIOH info, or I need request
one from redhat's lab?

Or expert from HPE UV team can make a patch based on the finding and
analysis?

Thanks
Baoquan


I don't mind system specific quirks to hardware enumeration details,
as long as they don't pollute generic code with such special hacks.

I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be
wrong. Any other code that relies on it in the future will be wrong as well
on UV systems.

Which may come into play on other arches with the new upcoming
memory
technologies.

The right quirk would be to fix that up where it gets introduced, or
something like that.

Yes, does make sense.

Thanks,

Ingo