Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

From: Oscar Salvador
Date: Thu Feb 11 2021 - 16:39:25 EST


On Wed, Feb 10, 2021 at 05:16:29PM -0800, Mike Kravetz wrote:
> Should probably check for -EBUSY as this means someone started using
> the page while we were allocating a new one. It would complicate the
> code to try and do the 'right thing'. Right thing might be getting
> dissolving the new pool page and then trying to isolate this in use
> page. Of course things could change again while you are doing that. :(

Yeah, I kept the error handling rather low just be clear about the
approach I was leaning towards, but yes, we should definitely check
for -EBUSY on dissolve_free_huge_page().

And it might be that dissolve_free_huge_page() returns -EBUSY on the old
page, and we need to dissolve the freshly allocated one as it is not
going to be used, and that might fail as well due to reserves for
instance, or maybe someone started using it?
I have to confess that I need to check the reservation code closer to be
aware of corner cases.

We used to try to be clever in such situations in memory-failure code,
but at some point you end up thinking "ok, how many retries are
considered enough?".
That code was trickier as we were handling uncorrected/corrected memory
errors, so we strived to do our best, but here we can be more flexible
as the whole thing is racy per se, and just fail if we find too many
obstacles.

I shall resume work early next week.

Thanks for the tips ;-)

--
Oscar Salvador
SUSE L3