Re: [PATCH v3] mm: fix tick timer stall during deferred page init

From: Michal Hocko
Date: Wed Apr 01 2020 - 11:42:23 EST


I am sorry but I have completely missed this patch.

On Wed 11-03-20 20:38:48, Shile Zhang wrote:
> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will
> initialise the deferred pages with local interrupts disabled. It is
> introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while
> initializing deferred pages").
>
> On machine with NCPUS <= 2, the 'pgdatinit' kthread could be bound to
> the boot CPU, which could caused the tick timer long time stall, system
> jiffies not be updated in time.
>
> The dmesg shown that:
>
> [ 0.197975] node 0 initialised, 32170688 pages in 1ms
>
> Obviously, 1ms is unreasonable.
>
> Now, fix it by restore in the pending interrupts for every 32*1204 pages
> (128MB) initialized, give the chance to update the systemd jiffies.
> The reasonable demsg shown likes:
>
> [ 1.069306] node 0 initialised, 32203456 pages in 894ms
>
> Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages").

I dislike this solution TBH. It effectivelly conserves the current code
and just works around the problem. Why do we hold the IRQ lock here in
the first place? This is an early init code and a very limited code is
running at this stage. Certainly nothing memory hotplug related which
should be the only path really interested in the resize lock AFAIR.

This needs a double checking but I strongly believe that the lock can be
simply dropped in this path.

> Co-developed-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
> Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
> Signed-off-by: Shile Zhang <shile.zhang@xxxxxxxxxxxxxxxxx>
> ---
> mm/page_alloc.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..a3a47845e150 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1763,12 +1763,17 @@ deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
> return nr_pages;
> }
>
> +/*
> + * Release the pending interrupts for every TICK_PAGE_COUNT pages.
> + */
> +#define TICK_PAGE_COUNT (32 * 1024)
> +
> /* Initialise remaining memory on a node */
> static int __init deferred_init_memmap(void *data)
> {
> pg_data_t *pgdat = data;
> const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
> - unsigned long spfn = 0, epfn = 0, nr_pages = 0;
> + unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0;
> unsigned long first_init_pfn, flags;
> unsigned long start = jiffies;
> struct zone *zone;
> @@ -1779,6 +1784,7 @@ static int __init deferred_init_memmap(void *data)
> if (!cpumask_empty(cpumask))
> set_cpus_allowed_ptr(current, cpumask);
>
> +again:
> pgdat_resize_lock(pgdat, &flags);
> first_init_pfn = pgdat->first_deferred_pfn;
> if (first_init_pfn == ULONG_MAX) {
> @@ -1790,7 +1796,6 @@ static int __init deferred_init_memmap(void *data)
> /* Sanity check boundaries */
> BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
> BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat));
> - pgdat->first_deferred_pfn = ULONG_MAX;
>
> /* Only the highest zone is deferred so find it */
> for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> @@ -1809,9 +1814,23 @@ static int __init deferred_init_memmap(void *data)
> * that we can avoid introducing any issues with the buddy
> * allocator.
> */
> - while (spfn < epfn)
> + while (spfn < epfn) {
> nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> + /*
> + * Release the interrupts for every TICK_PAGE_COUNT pages
> + * (128MB) to give tick timer the chance to update the
> + * system jiffies.
> + */
> + if ((nr_pages - prev_nr_pages) > TICK_PAGE_COUNT) {
> + prev_nr_pages = nr_pages;
> + pgdat->first_deferred_pfn = spfn;
> + pgdat_resize_unlock(pgdat, &flags);
> + goto again;
> + }
> + }
> +
> zone_empty:
> + pgdat->first_deferred_pfn = ULONG_MAX;
> pgdat_resize_unlock(pgdat, &flags);
>
> /* Sanity check that the next zone really is unpopulated */
> --
> 2.24.0.rc2

--
Michal Hocko
SUSE Labs