Re: [EXTERNAL] Re: [PATCH] x86/Hyper-V: Support for free page reporting

From: Alexander Duyck
Date: Fri May 22 2020 - 16:12:12 EST


On Fri, May 22, 2020 at 9:42 AM Sunil Muthuswamy <sunilmut@xxxxxxxxxxxxx> wrote:
>

[...]

> >
> > > + order = get_order(sg->length);
> > > + range = &hint->ranges[i];
> > > + range->address_space = 0;
> >
> > I guess this means all address spaces?
>
> 'address_space' is being used here just as a zero initializer for the union. I think the use
> of 'address_space' in the definition of hv_gpa_page_range is not appropriate. This struct is
> defined in the TLFS 6.0 with the same name, if you want to look it up.
>
> >
> > > + range->page.largepage = 1;
> >
> > What effect does this have? What if the page is a 4k page?
>
> Page reporting infrastructure doesn't report 4k pages today, but only huge pages (see
> PAGE_REPORTING_MIN_ORDER in page_reporting.h). Additionally, the Hyper-V hypervisor
> only supports reporting of 2M pages and above. The current code assumes that the minimum
> order will be 9 i.e 2M pages and above.
> If we feel that this could change in the future, or an implementation detail that should be
> protected against, I can add some checks in hv_balloon.c. But, in that case, the ballon driver
> should be able to query the page reporting min order somehow, which it is currently, since it is
> private.
> Alexander, do you have any suggestions or feedback here?

For now we are keeping the limit to order 9 for a couple reasons. The
first being that we don't want to trigger the breaking apart of
transparent huge pages on the host, and the second being for
performance as it is better to report larger pages rather than smaller
ones.

If we were to enable bringing the value down lower we would likely
make it a part of the page reporting dev info structure and would have
to be set at initialization time. That way the device itself could
configure the minimum value. I don't see the value itself being
lowered without adding an option like that since it would likely cause
issues for several different reasons going forward though. If nothing
else you could do a BUILD_BUG_ON that would assert if
PAGE_REPORTING_MIN_ORDER was anything other than the same size as the
"large_page" size referenced above.

> >
> > > + range->page.additional_pages = (1ull << (order - 9)) - 1;
> >
> > What is 9 here? Is there a macro name *ORDER that you can use?
>
> It is to determine the count of 2M pages. I will define a macro.

Is this the only spot where you are using order? Instead of converting
the length to an order why not just divide it by the 2M page size? I
would think that would be faster than having to do something like
having to call get_order on the length? Also instead of using "9" it
might make more sense if you have a define somewhere that says what
"large_page" size actually is. Then you could just divide by that
which should be translated into a shift which is fast and cheap.