Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps

From: David Hildenbrand
Date: Tue Jul 24 2018 - 08:17:24 EST


On 24.07.2018 09:25, Michal Hocko wrote:
> On Mon 23-07-18 19:20:43, David Hildenbrand wrote:
>> On 23.07.2018 14:30, Michal Hocko wrote:
>>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote:
>>>> On 07/20/2018 02:34 PM, David Hildenbrand wrote:
>>>>> Dumping tools (like makedumpfile) right now don't exclude reserved pages.
>>>>> So reserved pages might be access by dump tools although nobody except
>>>>> the owner should touch them.
>>>>
>>>> Are you sure about that? Or maybe I understand wrong. Maybe it changed
>>>> recently, but IIRC pages that are backing memmap (struct pages) are also
>>>> PG_reserved. And you definitely do want those in the dump.
>>>
>>> You are right. reserve_bootmem_region will make all early bootmem
>>> allocations (including those backing memmaps) PageReserved. I have asked
>>> several times but I haven't seen a satisfactory answer yet. Why do we
>>> even care for kdump about those. If they are reserved the nobody should
>>> really look at those specific struct pages and manipulate them. Kdump
>>> tools are using a kernel interface to read the content. If the specific
>>> content is backed by a non-existing memory then they should simply not
>>> return anything.
>>>
>>
>> "new kernel" provides an interface to read memory from "old kernel".
>>
>> The new kernel has no idea about
>> - which memory was added/online in the old kernel
>> - where struct pages of the old kernel are and what their content is
>> - which memory is save to touch and which not
>>
>> Dump tools figure all that out by interpreting the VMCORE. They e.g.
>> identify "struct pages" and see if they should be dumped. The "new
>> kernel" only allows to read that memory. It cannot hinder to crash the
>> system (e.g. if a dump tool would try to read a hwpoison page).
>>
>> So how should the "new kernel" know if a page can be touched or not?
>
> I am sorry I am not familiar with kdump much. But from what I remember
> it reads from /proc/vmcore and implementation of this interface should
> simply return EINVAL or alike when you try to dump inaccessible memory
> range.

Oh, and BTW, while something like -EINVAL could work, we usually don't
want to try to read certain pages at all (e.g. ballooned pages -
accessing the page might work but involves quite some overhead in the
hypervisor).

So we should either handle this in dump tools (reserved + ...?) or while
doing the read similar to XEN (is_ram_page()).

I wonder if we could convert the early allocated memory (PG_reserved) at
some point (buddy initialized) into ordinary "simply allocated" memory.

--

Thanks,

David / dhildenb