Re: [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report free pages

From: Nitesh Narayan Lal
Date: Thu Feb 07 2019 - 15:50:22 EST

On 2/7/19 12:43 PM, Alexander Duyck wrote:
> On Tue, Feb 5, 2019 at 3:21 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>> On Tue, Feb 05, 2019 at 04:54:03PM -0500, Nitesh Narayan Lal wrote:
>>> On 2/5/19 3:45 PM, Michael S. Tsirkin wrote:
>>>> On Mon, Feb 04, 2019 at 03:18:53PM -0500, Nitesh Narayan Lal wrote:
>>>>> This patch enables the kernel to scan the per cpu array and
>>>>> compress it by removing the repetitive/re-allocated pages.
>>>>> Once the per cpu array is completely filled with pages in the
>>>>> buddy it wakes up the kernel per cpu thread which re-scans the
>>>>> entire per cpu array by acquiring a zone lock corresponding to
>>>>> the page which is being scanned. If the page is still free and
>>>>> present in the buddy it tries to isolate the page and adds it
>>>>> to another per cpu array.
>>>>> Once this scanning process is complete and if there are any
>>>>> isolated pages added to the new per cpu array kernel thread
>>>>> invokes hyperlist_ready().
>>>>> In hyperlist_ready() a hypercall is made to report these pages to
>>>>> the host using the virtio-balloon framework. In order to do so
>>>>> another virtqueue 'hinting_vq' is added to the balloon framework.
>>>>> As the host frees all the reported pages, the kernel thread returns
>>>>> them back to the buddy.
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx>
>>>> This looks kind of like what early iterations of Wei's patches did.
>>>> But this has lots of issues, for example you might end up with
>>>> a hypercall per a 4K page.
>>>> So in the end, he switched over to just reporting only
>>>> MAX_ORDER - 1 pages.
>>> You mean that I should only capture/attempt to isolate pages with order
>>> MAX_ORDER - 1?
>>>> Would that be a good idea for you too?
>>> Will it help if we have a threshold value based on the amount of memory
>>> captured instead of the number of entries/pages in the array?
>> This is what Wei's patches do at least.
> So in the solution I had posted I was looking more at
> HUGETLB_PAGE_ORDER and above as the size of pages to provide the hints
> on [1]. The advantage to doing that is that you can also avoid
> fragmenting huge pages which in turn can cause what looks like a
> memory leak as the memory subsystem attempts to reassemble huge
> pages[2]. In my mind a 2MB page makes good sense in terms of the size
> of things to be performing hints on as anything smaller than that is
> going to just end up being a bunch of extra work and end up causing a
> bunch of fragmentation.
As per my opinion, in any implementation which page size to store before
reporting depends on the allocation pattern of the workload running in
the guest.

I am also planning to try Michael's suggestion of using MAX_ORDER - 1.
However I am still thinking about a workload which I can use to test its

> The only issue with limiting things on an arbitrary boundary like that
> is that you have to hook into the buddy allocator to catch the cases
> where a page has been merged up into that range.
I don't think, I understood your comment completely. In any case, we
have to rely on the buddy for merging the pages.
> [1]
> [2]

Attachment: signature.asc
Description: OpenPGP digital signature