Re: [PATCH] mm: avoid blocking lock_page() in kcompactd

From: Mel Gorman
Date: Fri Jan 10 2020 - 04:23:02 EST


On Thu, Jan 09, 2020 at 02:56:46PM -0800, Cong Wang wrote:
> We observed kcompactd hung at __lock_page():
>
> INFO: task kcompactd0:57 blocked for more than 120 seconds.
> Not tainted 4.19.56.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kcompactd0 D 0 57 2 0x80000000
> Call Trace:
> ? __schedule+0x236/0x860
> schedule+0x28/0x80
> io_schedule+0x12/0x40
> __lock_page+0xf9/0x120
> ? page_cache_tree_insert+0xb0/0xb0
> ? update_pageblock_skip+0xb0/0xb0
> migrate_pages+0x88c/0xb90
> ? isolate_freepages_block+0x3b0/0x3b0
> compact_zone+0x5f1/0x870
> kcompactd_do_work+0x130/0x2c0
> ? __switch_to_asm+0x35/0x70
> ? __switch_to_asm+0x41/0x70
> ? kcompactd_do_work+0x2c0/0x2c0
> ? kcompactd+0x73/0x180
> kcompactd+0x73/0x180
> ? finish_wait+0x80/0x80
> kthread+0x113/0x130
> ? kthread_create_worker_on_cpu+0x50/0x50
> ret_from_fork+0x35/0x40
>
> which faddr2line maps to:
>
> migrate_pages+0x88c/0xb90:
> lock_page at include/linux/pagemap.h:483
> (inlined by) __unmap_and_move at mm/migrate.c:1024
> (inlined by) unmap_and_move at mm/migrate.c:1189
> (inlined by) migrate_pages at mm/migrate.c:1419
>
> Sometimes kcompactd eventually got out of this situation, sometimes not.
>
> I think for memory compaction, it is a best effort to migrate the pages,
> so it doesn't have to wait for I/O to complete. It is fine to call
> trylock_page() here, which is pretty much similar to
> buffer_migrate_lock_buffers().
>
> Given MIGRATE_SYNC_LIGHT is used on compaction path, just relax the
> check for it.
>

Is this a single page being locked for a long time or multiple pages
being locked without reaching a reschedule point?

If it's a single page being locked, it's important to identify what held
page lock for 2 minutes because that is potentially a missing
unlock_page. The kernel in question is old -- 4.19.56. Are there any
other modifications to that kernel?

--
Mel Gorman
SUSE Labs