Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations
From: David Rientjes
Date: Wed Oct 26 2011 - 02:35:40 EST
On Tue, 25 Oct 2011, Colin Cross wrote:
> Makes sense. What about this? Official patch to follow.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fef8dc3..59cd4ff 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1786,6 +1786,13 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> return 0;
> + * If PM has disabled I/O, OOM is disabled and reclaim is unlikely
> + * to make any progress. To prevent a livelock, don't retry.
> + */
> + if (!(gfp_allowed_mask & __GFP_FS))
> + return 0;
> + /*
> * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> * means __GFP_NOFAIL, but that may not be true in other
> * implementations.
Eek, this is precisely what we don't want and is functionally the same as
what you initially proposed except it doesn't care about __GFP_NOFAIL.
You're trying to address a suspend issue where nothing on the system can
logically make progress because __GFP_FS seriously restricts the ability
of reclaim to do anything useful if it doesn't succeed the first time and
kswapd isn't effective. That's why I suggested a hook into
pm_restrict_gfp_mask() to set a variable and then treat it exactly as
__GFP_NORETRY in should_alloc_retry().
Consider if nobody is using suspend and they are allocating with GFP_NOFS.
There's potentially a lot of candidates:
$ grep -r GFP_NOFS * | wc -l
and now we've just introduced a regression where the allocation would
eventually succeed because of either kswapd, a backing device that is no
longer congested, or an allocation on another cpu in a context where
direct reclaim can be more aggressive or the oom killer can at least free
So you definitely want to localize your change to only suspend and
pm_restrict_gfp_mask() is a very easy way to do it. So I'd suggest adding
a static bool that can be tested in should_alloc_retry() and identify such
situations and tag it as __read_mostly.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/