Re: [RFC][Patch v12 1/2] mm: page_reporting: core infrastructure

From: Nitesh Narayan Lal
Date: Fri Aug 30 2019 - 11:15:45 EST



On 8/12/19 2:47 PM, Alexander Duyck wrote:
> On Mon, Aug 12, 2019 at 6:13 AM Nitesh Narayan Lal <nitesh@xxxxxxxxxx> wrote:
>> This patch introduces the core infrastructure for free page reporting in
>> virtual environments. It enables the kernel to track the free pages which
>> can be reported to its hypervisor so that the hypervisor could
>> free and reuse that memory as per its requirement.
>>
>> While the pages are getting processed in the hypervisor (e.g.,
>> via MADV_DONTNEED), the guest must not use them, otherwise, data loss
>> would be possible. To avoid such a situation, these pages are
>> temporarily removed from the buddy. The amount of pages removed
>> temporarily from the buddy is governed by the backend(virtio-balloon
>> in our case).
>>
>> To efficiently identify free pages that can to be reported to the
>> hypervisor, bitmaps in a coarse granularity are used. Only fairly big
>> chunks are reported to the hypervisor - especially, to not break up THP
>> in the hypervisor - "MAX_ORDER - 2" on x86, and to save space. The bits
>> in the bitmap are an indication whether a page *might* be free, not a
>> guarantee. A new hook after buddy merging sets the bits.
>>
>> Bitmaps are stored per zone, protected by the zone lock. A workqueue
>> asynchronously processes the bitmaps, trying to isolate and report pages
>> that are still free. The backend (virtio-balloon) is responsible for
>> reporting these batched pages to the host synchronously. Once reporting/
>> freeing is complete, isolated pages are returned back to the buddy.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx>
[...]
>> +static void scan_zone_bitmap(struct page_reporting_config *phconf,
>> + struct zone *zone)
>> +{
>> + unsigned long setbit;
>> + struct page *page;
>> + int count = 0;
>> +
>> + sg_init_table(phconf->sg, phconf->max_pages);
>> +
>> + for_each_set_bit(setbit, zone->bitmap, zone->nbits) {
>> + /* Process only if the page is still online */
>> + page = pfn_to_online_page((setbit << PAGE_REPORTING_MIN_ORDER) +
>> + zone->base_pfn);
>> + if (!page)
>> + continue;
>> +
> Shouldn't you be clearing the bit and dropping the reference to
> free_pages before you move on to the next bit? Otherwise you are going
> to be stuck with those aren't you?
>
>> + spin_lock(&zone->lock);
>> +
>> + /* Ensure page is still free and can be processed */
>> + if (PageBuddy(page) && page_private(page) >=
>> + PAGE_REPORTING_MIN_ORDER)
>> + count = process_free_page(page, phconf, count);
>> +
>> + spin_unlock(&zone->lock);
> So I kind of wonder just how much overhead you are taking for bouncing
> the zone lock once per page here. Especially since it can result in
> you not actually making any progress since the page may have already
> been reallocated.
>

I am wondering if there is a way to measure this overhead?
After thinking about this, I do understand your point.
One possible way which I can think of to address this is by having a
page_reporting_dequeue() hook somewhere in the allocation path.
ÂÂÂÂ
For some reason, I am not seeing this work as I would have expected
but I don't have solid reasoning to share yet. It could be simply
because I am putting my hook at the wrong place. I will continue
investigating this.

In any case, I may be over complicating things here, so please let me
if there is a better way to do this.

If this overhead is not significant we can probably live with it.

--
Thanks
Nitesh