Re: [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()

From: Baoquan He
Date: Mon Jun 01 2020 - 09:32:21 EST


On 06/01/20 at 02:42pm, Mike Rapoport wrote:
> On Thu, May 28, 2020 at 10:15:10AM -0500, Steve Wahl wrote:
> > On Thu, May 28, 2020 at 05:07:31PM +0800, Baoquan He wrote:
> > > On 05/26/20 at 01:49pm, David Hildenbrand wrote:
> > > > On 26.05.20 13:32, Mike Rapoport wrote:
> > > > > Hello Baoquan,
> > > > >
> > > > > I do not object to your original fix with careful check for pfn validity.
> > > > >
> > > > > But I still think that the memory reserved by the firmware is still
> > > > > memory and it should be added to memblock.memory. This way the memory
> > > >
> > > > If it's really memory that could be read/written, I think I agree.
> > >
> > > I would say some of them may not be allowed to be read/written, if I
> > > understand it correctly. I roughly went through the x86 init code, there
> > > are some places where mem region is marked as E820_TYPE_RESERVED so that
> > > they are not touched after initialization. E.g:
> > >
> > > 1) pfn 0
> > > In trim_bios_range(), we set the pfn 0 as E820_TYPE_RESERVED. You can
> > > see the code comment, this is a BIOS owned area, but not kernel RAM.
> > >
> > > 2)GART reserved region
> > > In early_gart_iommu_check(), GART IOMMU firmware will reserve a region
> > > in an area, firmware designer won't map system RAM into that area.
> > >
> > > And also intel_graphics_stolen(), arch_rmrr_sanity_check(), these
> > > regions are not system RAM backed area, reading from or writting into
> > > these area may cause error.
> > >
> > > Futhermore, there's a KASLR bug found by HPE, its triggering and root
> > > cause are written into below commit log. You can see that accessing to
> > > firmware reserved region caused BIOS to halt system when cpu doing
> > > speculative.
> > >
> > > commit 2aa85f246c181b1fa89f27e8e20c5636426be624
> > > Author: Steve Wahl <steve.wahl@xxxxxxx>
> > > Date: Tue Sep 24 16:03:55 2019 -0500
> > >
> > > x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
> > >
> > > Our hardware (UV aka Superdome Flex) has address ranges marked
> > > reserved by the BIOS. Access to these ranges is caught as an error,
> > > causing the BIOS to halt the system.
> >
> > Thank you for CC'ing me. Coming into the middle of the conversation,
> > I am trying to understand the implications of what's being discussed
> > here, but not being very successful at it.
> >
> > For the HPE UV hardware, the addresses listed in the reserved range
> > must never be accessed, or (as we discovered) even be reachable by an
> > active page table entry. Any active page table entry covering the
> > region allows the processor to issue speculative accesses to the
> > region, resulting in the BIOS halt mentioned.
> >
> > I'm not sure if the discussion above about adding the region to
> > memblock.memory would result in an active mapping of the region or
> > not. If it did, we'd have problems.
>
> The discussion is whether regions marked as E820_RESERVED should be
> considered as RAM or not.

IMHO, the discussion is about whether firmware reserved regions like
E820_TYPE_RESERVED should be added into memory allocators.

>
> For the hardware that cannot tolerate acceses to these areas like HPE
> UV, it should not :)
>
> I still think that keeping them only in memblock.reserved creates more
> problems than it solves, but simply adding E820_RESERVED areas to
> memblock.memory just won't work.

Currently, we hardly add E820_RESERVED into memblock, including
memblock.memmory and memblock.reserved. But, in several places, people
add them to memblock.reserved when they have marked them as
E820_TYPE_RESERVED.

I agree with David that these pages aren't available for any later
memory allocator. "the node/zone is just completely irrelevant for
these pages". Keeping them out of the memory management system is safer
so that nobody won't touch them. I tried to mark them out with a new
type PageUnavail as David suggested, am still struggling to think of
where I should detect and exclude them.

Below is a draft patch about my attempt on marking out unavailable pages.
Any comment or suggestions are welcomed. Of course for any better idea
with code change, I would like to test and review. Hope we can finally fix
this issue after its exposure.