Re: [PATCH v12 6/8] mm: support reporting free page blocks

From: Wei Wang
Date: Tue Jul 25 2017 - 22:19:50 EST


On 07/25/2017 10:53 PM, Michal Hocko wrote:
On Tue 25-07-17 14:47:16, Wang, Wei W wrote:
On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote:
On Tue 25-07-17 19:56:24, Wei Wang wrote:
On 07/25/2017 07:25 PM, Michal Hocko wrote:
On Tue 25-07-17 17:32:00, Wei Wang wrote:
On 07/24/2017 05:00 PM, Michal Hocko wrote:
On Wed 19-07-17 20:01:18, Wei Wang wrote:
On 07/19/2017 04:13 PM, Michal Hocko wrote:
[...
We don't need to do the pfn walk in the guest kernel. When the API
reports, for example, a 2MB free page block, the API caller offers to
the hypervisor the base address of the page block, and size=2MB, to
the hypervisor.
So you want to skip pfn walks by regularly calling into the page allocator to
update your bitmap. If that is the case then would an API that would allow you
to update your bitmap via a callback be s sufficient? Something like
void walk_free_mem(int node, int min_order,
void (*visit)(unsigned long pfn, unsigned long nr_pages))

The function will call the given callback for each free memory block on the given
node starting from the given min_order. The callback will be strictly an atomic
and very light context. You can update your bitmap from there.
I would need to introduce more about the background here:
The hypervisor and the guest live in their own address space. The hypervisor's bitmap
isn't seen by the guest. I think we also wouldn't be able to give a callback function
from the hypervisor to the guest in this case.
How did you plan to use your original API which export struct page array
then?


That's where the virtio-balloon driver comes in. It uses a shared ring mechanism to
send the guest memory info to the hypervisor.

We didn't expose the struct page array from the guest to the hypervisor. For example, when
a 2MB free page block is reported from the free page list, the info put on the ring is just
(base address of the 2MB continuous memory, size=2M).



This would address my main concern that the allocator internals would get
outside of the allocator proper.
What issue would it have to expose the internal, for_each_zone()?
zone is a MM internal concept. No code outside of the MM proper should
really care about zones.

I think this is also what Andrew suggested in the previous discussion:
https://lkml.org/lkml/2017/3/16/951

Move the code to virtio-balloon and a little layering violation seems acceptable.


Best,
Wei