Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

From: Wei Wang
Date: Wed Jun 21 2017 - 04:36:20 EST


On 06/21/2017 01:29 AM, Rik van Riel wrote:
On Tue, 2017-06-20 at 18:49 +0200, David Hildenbrand wrote:
On 20.06.2017 18:44, Rik van Riel wrote:
Nitesh Lal (on the CC list) is working on a way
to efficiently batch recently freed pages for
free page hinting to the hypervisor.

If that is done efficiently enough (eg. with
MADV_FREE on the hypervisor side for lazy freeing,
and lazy later re-use of the pages), do we still
need the harder to use batch interface from this
patch?

David's opinion incoming:

No, I think proper free page hinting would be the optimum solution,
if
done right. This would avoid the batch interface and even turn
virtio-balloon in some sense useless.
I agree with that. Let me go into some more detail of
what Nitesh is implementing:

1) In arch_free_page, the being-freed page is added
to a per-cpu set of freed pages.

I got some questions here:

1. Are the pages managed one by one on the per-CPU set?
For example, when there are 2 adjacent pages, are they still
put as two nodes on the per-CPU list? or the buddy algorithm
will be re-implemented on the per-CPU list as well?

2. Looks like this will be added to the common free function.
Normally, people may not need the free page hint, do they
need to carry the added burden?


2) Once that set is full, arch_free_pages goes into a
slow path, which:
2a) Iterates over the set of freed pages, and
2b) Checks whether they are still free, and

The pages that have been double checked as "free"
pages here and added to the list for the hypervisor can
also be immediately used.


2c) Adds the still free pages to a list that is
to be passed to the hypervisor, to be MADV_FREEd.
2d) Makes that hypercall.

Meanwhile all arch_alloc_pages has to do is make sure it
does not allocate a page while it is currently being
MADV_FREEd on the hypervisor side.

Is this proposed to replace the balloon driver?


The code Wei is working on looks like it could be
suitable for steps (2c) and (2d) above. Nitesh already
has code for steps 1 through 2b.


May I know the advantages of the added steps? Thanks.

Best,
Wei