Re: [PATCH] mm: avoid blocking lock_page() in kcompactd

From: Yang Shi
Date: Thu Jan 09 2020 - 19:28:56 EST


On Thu, Jan 9, 2020 at 2:57 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
> We observed kcompactd hung at __lock_page():
>
> INFO: task kcompactd0:57 blocked for more than 120 seconds.
> Not tainted 4.19.56.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kcompactd0 D 0 57 2 0x80000000
> Call Trace:
> ? __schedule+0x236/0x860
> schedule+0x28/0x80
> io_schedule+0x12/0x40
> __lock_page+0xf9/0x120
> ? page_cache_tree_insert+0xb0/0xb0
> ? update_pageblock_skip+0xb0/0xb0
> migrate_pages+0x88c/0xb90
> ? isolate_freepages_block+0x3b0/0x3b0
> compact_zone+0x5f1/0x870
> kcompactd_do_work+0x130/0x2c0
> ? __switch_to_asm+0x35/0x70
> ? __switch_to_asm+0x41/0x70
> ? kcompactd_do_work+0x2c0/0x2c0
> ? kcompactd+0x73/0x180
> kcompactd+0x73/0x180
> ? finish_wait+0x80/0x80
> kthread+0x113/0x130
> ? kthread_create_worker_on_cpu+0x50/0x50
> ret_from_fork+0x35/0x40
>
> which faddr2line maps to:
>
> migrate_pages+0x88c/0xb90:
> lock_page at include/linux/pagemap.h:483
> (inlined by) __unmap_and_move at mm/migrate.c:1024
> (inlined by) unmap_and_move at mm/migrate.c:1189
> (inlined by) migrate_pages at mm/migrate.c:1419
>
> Sometimes kcompactd eventually got out of this situation, sometimes not.
>
> I think for memory compaction, it is a best effort to migrate the pages,
> so it doesn't have to wait for I/O to complete. It is fine to call
> trylock_page() here, which is pretty much similar to
> buffer_migrate_lock_buffers().
>
> Given MIGRATE_SYNC_LIGHT is used on compaction path, just relax the
> check for it.

But this changed the semantics of MIGRATE_SYNC_LIGHT which means
blocking on most operations but not ->writepage. When
MIGRATE_SYNC_LIGHT is used it means compaction priority is increased
(the initial priority is ASYNC) due to whatever reason (i.e. not
enough clean, non-writeback and non-locked pages to migrate). So, it
has to wait for some pages to try to not backoff pre-maturely.

If I read the code correctly, buffer_migrate_lock_buffers() also
blocks on page lock with non-ASYNC mode.

Since v5.1 Mel Gorman improved compaction a lot. So, I'm wondering if
this happens on the latest upstream or not.

And, did you figure out who is locking the page for such long time? Or
there might be too many waiters on the list for this page?

>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: linux-mm@xxxxxxxxx
> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx>
> ---
> mm/migrate.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 86873b6f38a7..df60026779d2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1010,7 +1010,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> bool is_lru = !__PageMovable(page);
>
> if (!trylock_page(page)) {
> - if (!force || mode == MIGRATE_ASYNC)
> + if (!force || mode == MIGRATE_ASYNC
> + || mode == MIGRATE_SYNC_LIGHT)
> goto out;
>
> /*
> --
> 2.21.1
>
>