Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

From: David Hildenbrand
Date: Thu Feb 14 2019 - 05:44:30 EST

On 14.02.19 11:00, David Hildenbrand wrote:
> On 14.02.19 10:08, Wang, Wei W wrote:
>> On Wednesday, February 13, 2019 5:19 PM, David Hildenbrand wrote:
>>> If you have to resize/alloc/coordinate who will report, you will need locking.
>>> Especially, I doubt that there is an atomic xbitmap (prove me wrong :) ).
>> Yes, we need change xbitmap to support it.
>> Just thought of another option, which would be better:
>> - xb_preload in prepare_alloc_pages to pre-allocate the bitmap memory;
>> - xb_set/clear the bit under the zone->lock, i.e. in rmqueue and free_one_page
> And how to preload without locking?
>> will not be concurrently called to race on the same bitmap.
>> And we don't add any new locks to generate new doubts.
>> Also, we can probably remove the arch_alloc/free_page part.
>> For the first step, we could optimize VIRTIO_BALLOON_F_FREE_PAGE_HINT for the live migration optimization:
>> - just replace alloc_pages(VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG,
>> with get_free_page_hints()
>> get_free_page_hints() was designed to clear the bit, and need put_free_page_hints() to set it later after host finishes madvise. For the live migration usage, as host doesn't free the backing host pages, so we can give get_free_page_hints a parameter option to not clear the bit for this usage. It will be simpler and faster.
>> I think get_free_page_hints() to read hints via bitmaps should be much faster than that allocation function, which takes around 15us to get a 4MB block. Another big bonus is that we don't need free_pages() to return all the pages back to buddy (it's a quite expensive operation too) when migration is done.
>> For the second step, we can improve ballooning, e.g. a new feature VIRTIO_BALLOON_F_ADVANCED_BALLOON to use the same get_free_page_hints() and another put_free_page_hints(), along with the virtio-balloon's report_vq and ack_vq to wait for the host's ack before making the free page ready.
>> (I think waiting for the host ack is the overhead that the guest has to suffer for enabling memory overcommitment, and even with this v8 patch series it also needs to do that. The optimization method was described yesterday)
> As I already said, I don't like that approach, because it has the
> fundamental issue of page allocs getting blocked. That does not mean
> that it is bad, but that I think what Nitesh has is superior in that
> sense. Of course, things like "how to enable/disable", and much more
> needs to be clarified.
> If you believe in your approach, feel free to come up with a prototype.
> Especially the "no global locking" could be tricky in my opinion :)

I want to add that your approach makes sense if we expect that the
hypervisor will ask for free memory very rarely. Then, blocking during
page alloc is most probably acceptable. Depending on the setup, this
might or might not be the case. If you have some guests that are
allocating/freeing memory continuously, you might want to get back free
pages fairly often to move them to other guests.

In case the hypervisor asks for free pages, as we are not reporting
continuously, you would have to somehow report all pages currently free
to the hypervisor, making sure via the bitmap that they cannot be allocated.

You certainly don't want to track free pages in a bitmap if the
hypervisor is not asking for free pages, otherwise you will waste
eventually a big amount of memory tracking page states nobody cares
about in a xbtimap. So you would have to use another way to initially
fill the bitmap with free pages (when the hypervisor requests it), while
making sure to avoid races with pages getting allocated just while you
are creating the bitmap.



David / dhildenb