Re: [PATCH v2 1/2] x86/efi: don't allocate memmap through memblock after mm_init()
From: Ard Biesheuvel
Date: Thu Jan 05 2017 - 06:34:45 EST
On 5 January 2017 at 10:15, Nicolai Stange <nicstange@xxxxxxxxx> wrote:
> Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> writes:
>
>> On 5 January 2017 at 07:42, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>>
>>> * Nicolai Stange <nicstange@xxxxxxxxx> wrote:
>>>
>>>> Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> writes:
>>>>
>>>> > On Thu, 22 Dec, at 11:23:39AM, Nicolai Stange wrote:
>>>> >> So, after memblock is gone, allocations should be done through
>>>> >> the "normal"
>>>> >> page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
>>>> >> it from efi_arch_mem_reserve() and from efi_free_boot_services() as well.
>>>> >>
>>>> >> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to
>>>> >> avoid copying image data")
>>>> >> Signed-off-by: Nicolai Stange <nicstange@xxxxxxxxx>
>>>>
>>>> > Could you also modify efi_fake_memmap() to use your new
>>>> > efi_memmap_alloc() function for consistency
>>>>
>>>> Sure.
>>>>
>>>> I'm planning to submit another set of patches addressing the (bounded)
>>>> memmap leaking in anything calling efi_memmap_unmap() though. In the
>>>> course of doing so, the memmap allocation sites will get touched anyway:
>>>> I'll have to store some information about how the memmap's memory has
>>>> been obtained.
>>>
>>> Will that patch be intrusive?
>
> Yes, definitely something for 4.11+.
>
>
>> Given that memblock_alloc() calls memblock_reserve() on its
>> allocations, we could simply consult the memblock_reserved table to
>> infer whether the allocation being freed was created with
>> memblock_alloc() or with alloc_pages().
>
> Not sure whether this would work with CONFIG_ARCH_DISCARD_MEMBLOCK=y.
> This is also the reason why 2/2 is needed.
>
OK, I hadn't considered that. It is rather unfortunate that on x86,
each call to efi_arch_mem_reserve() could potentially result in a new
allocation to be created for the entire memory map (consisting of
hundreds of 40-byte entries), leaking the old one. So I agree that
this code needs some work, and I also agree that it is v4.11 material
at the earliest.
>> So I don't think such a patch
>> should be that intrusive. But the normal case is that the EFI memory
>> map remains mapped during the lifetime of the system, and unmapping
>> the EFI memory map does not necessarily imply that it should be freed.
>> This is especially true on ARM systems, where the memory map is
>> allocated and populated by the stub, and never modified by the kernel
>> proper, and so any freeing logic in generic code should take this into
>> account as well (i.e., the memory map allocation is not
>> memblock_reserve()'d, nor is it allocated using alloc_pages())
>
>
>
>>> If yes then we'll need to keep this a separate urgent patch to fix the v4.9
>>> regression that Dan Williams reported. I can apply the fix to
>>> efi/urgent and get
>>> it to Linus straight away if you guys agree.
>>>
>>
>> Considering the severity of the issue it solves, and the obvious
>> correctness of the fix, my preference would be to spin a v3 of this
>> patch taking Matt's feedback into account, and merging that as a fix
>> for v4.10 with a cc stable. The 2/2 can wait a bit longer imo
>
> Matt's Feedback included that
>
> "all memblock_alloc()s should probably be PAGE_SIZE aligned like the
> fakemem code"
>
> Unfortunately, I can't see why this would be needed. Furthermore, this
> isn't currently done outside of fakemem and thus, aliging the memmap
> allocations on PAGE_SIZE would be another, quite unrelated change?
>
We use a pool allocation for the memory map on ARM, which is never
page aligned (and page aligned in EFI doesn't mean as much when the OS
page size is 16k or 64k) so I think you are correct here.
> So, are you Ok with only taking the other review comment, namely
>
> "modify efi_fake_memmap() to use your new efi_memmap_alloc() function
> for consistency"
>
> into account for a v3?
>
Yes, that should be sufficient for a fix that can be backported.
Anything else can wait for now