Re: [PATCH v2] mm: Warn about costly page allocation

From: Minchan Kim
Date: Mon Jul 09 2012 - 20:04:12 EST


Please ignore,
It is sent by mistake. :(
Sorry for the noise.

On Tue, Jul 10, 2012 at 08:55:53AM +0900, Minchan Kim wrote:
> Since lumpy reclaim was introduced at 2.6.23, it helped higher
> order allocation.
> Recently, we removed it at 3.4 and we didn't enable compaction
> forcingly[1]. The reason makes sense that compaction.o + migration.o
> isn't trivial for system doesn't use higher order allocation.
> But the problem is that we have to enable compaction explicitly
> while lumpy reclaim enabled unconditionally.
>
> Normally, admin doesn't know his system have used higher order
> allocation and even lumpy reclaim have helped it.
> Admin in embdded system have a tendency to minimise code size so that
> they can disable compaction. In this case, we can see page allocation
> failure we can never see in the past. It's critical on embedded side
> because...
>
> Let's think this scenario.
>
> There is QA team in embedded company and they have tested their product.
> In test scenario, they can allocate 100 high order allocation.
> (they don't matter how many high order allocations in kernel are needed
> during test. their concern is just only working well or fail of their
> middleware/application) High order allocation will be serviced well
> by natural buddy allocation without lumpy's help. So they released
> the product and sold out all over the world.
> Unfortunately, in real practice, sometime, 105 high order allocation was
> needed rarely and fortunately, lumpy reclaim could help it so the product
> doesn't have a problem until now.
>
> If they use latest kernel, they will see the new config CONFIG_COMPACTION
> which is very poor documentation, and they can't know it's replacement of
> lumpy reclaim(even, they don't know lumpy reclaim) so they simply disable
> that option for size optimization. Of course, QA team still test it but they
> can't find the problem if they don't do test stronger than old.
> It ends up release the product and sold out all over the world, again.
> But in this time, we don't have both lumpy and compaction so the problem
> would happen in real practice. A poor enginner from Korea have to flight
> to the USA for the fix a ton of products. Otherwise, should recall products
> from all over the world. Maybe he can lose a job. :(
>
> This patch adds warning for notice. If the system try to allocate
> PAGE_ALLOC_COSTLY_ORDER above page and system enters reclaim path,
> it emits the warning. At least, it gives a chance to look into their
> system before the relase.
>
> Please keep in mind. It's not a good idea to depend lumpy/compaction
> for regular high-order allocations. Both depends on being able to move
> MIGRATE_MOVABLE allocations to satisfy the high-order allocation. If used
> reregularly for high-order kernel allocations and tehy are long-lived,
> the system will eventually be unable to grant these allocations, with or
> without compaction or lumpy reclaim. Hatchet jobs that work around this problem
> include forcing MIGRATE_RESERVE to be only used for high-order allocations
> and tuning its size. It's a major hack though and is unlikely to be merged
> to mainline but might suit an embedded product.
>
> This patch avoids false positive by alloc_large_system_hash which
> allocates with GFP_ATOMIC and a fallback mechanism so it can make
> this warning useless.
>
> [1] c53919ad(mm: vmscan: remove lumpy reclaim)
>
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> ---
> Changelog
>
> * from v1
> - add more description about warning failure of high-order allocation
> - use printk_ratelimited/pr_warn and dump stack - [Mel, Andrew]
>
> mm/page_alloc.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a4d3a19..710d0e90 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2276,6 +2276,29 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> return alloc_flags;
> }
>
> +#if defined(CONFIG_DEBUG_VM) && !defined(CONFIG_COMPACTION)
> +static inline void check_page_alloc_costly_order(unsigned int order, gfp_t flags)
> +{
> + if (likely(order <= PAGE_ALLOC_COSTLY_ORDER))
> + return;
> +
> + if (!printk_ratelimited())
> + return;
> +
> + pr_warn("%s: page allocation high-order stupidity: "
> + "order:%d, mode:0x%x\n", current->comm, order, flags);
> + pr_warn("Enable compaction if high-order allocations are "
> + "very few and rare.\n");
> + pr_warn("If you need regular high-order allocation, "
> + "compaction wouldn't help it.\n");
> + dump_stack();
> +}
> +#else
> +static inline void check_page_alloc_costly_order(unsigned int order)
> +{
> +}
> +#endif
> +
> static inline struct page *
> __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> struct zonelist *zonelist, enum zone_type high_zoneidx,
> @@ -2353,6 +2376,8 @@ rebalance:
> if (!wait)
> goto nopage;
>
> + check_page_alloc_costly_order(order);
> +
> /* Avoid recursion of direct reclaim */
> if (current->flags & PF_MEMALLOC)
> goto nopage;
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/