Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations

From: David Rientjes
Date: Wed Oct 26 2011 - 02:35:40 EST


On Tue, 25 Oct 2011, Colin Cross wrote:

> Makes sense. What about this? Official patch to follow.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fef8dc3..59cd4ff 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1786,6 +1786,13 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> return 0;
>
> /*
> + * If PM has disabled I/O, OOM is disabled and reclaim is unlikely
> + * to make any progress. To prevent a livelock, don't retry.
> + */
> + if (!(gfp_allowed_mask & __GFP_FS))
> + return 0;
> +
> + /*
> * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> * means __GFP_NOFAIL, but that may not be true in other
> * implementations.

Eek, this is precisely what we don't want and is functionally the same as
what you initially proposed except it doesn't care about __GFP_NOFAIL.

You're trying to address a suspend issue where nothing on the system can
logically make progress because __GFP_FS seriously restricts the ability
of reclaim to do anything useful if it doesn't succeed the first time and
kswapd isn't effective. That's why I suggested a hook into
pm_restrict_gfp_mask() to set a variable and then treat it exactly as
__GFP_NORETRY in should_alloc_retry().

Consider if nobody is using suspend and they are allocating with GFP_NOFS.
There's potentially a lot of candidates:

$ grep -r GFP_NOFS * | wc -l
1016

and now we've just introduced a regression where the allocation would
eventually succeed because of either kswapd, a backing device that is no
longer congested, or an allocation on another cpu in a context where
direct reclaim can be more aggressive or the oom killer can at least free
some memory.

So you definitely want to localize your change to only suspend and
pm_restrict_gfp_mask() is a very easy way to do it. So I'd suggest adding
a static bool that can be tested in should_alloc_retry() and identify such
situations and tag it as __read_mostly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/