Re: [regression -next0117] What is kcompactd and why is he eating 100% of my cpu?

From: valdis . kletnieks
Date: Sat Jan 26 2019 - 22:00:46 EST


On Sat, 26 Jan 2019 21:00:05 +0100, Pavel Machek said:

> top - 13:38:51 up 1:42, 16 users, load average: 1.41, 1.93, 1.62
> Tasks: 182 total, 3 running, 138 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 2.3 us, 57.8 sy, 0.0 ni, 39.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 3020044 total, 2429420 used, 590624 free, 27468 buffers
> KiB Swap: 2097148 total, 0 used, 2097148 free. 1924268 cached Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 608 root 20 0 0 0 0 R 99.6 0.0 11:34.38 kcompactd0
> 9782 root 20 0 0 0 0 I 7.9 0.0 0:59.02 kworker/0:
> 2971 root 20 0 46624 23076 13576 S 4.3 0.8 2:50.22 Xorg

I've noticed this as well on earlier kernels (next-20181224 to 20190115)

Some more info:

1) echo 3 > /proc/sys/vm/drop_caches unwedges kcompactd in 1-3 seconds.

2) Typical kcompactd traceback:

cat /proc/27/stack
[<0>] retint_kernel+0x1b/0x2d
[<0>] lock_is_held_type+0x1b/0x50
[<0>] ___might_sleep+0xad/0x220
[<0>] __might_sleep+0x113/0x130
[<0>] on_each_cpu_cond_mask+0x12a/0x140
[<0>] on_each_cpu_cond+0x18/0x20
[<0>] invalidate_bh_lrus+0x29/0x30
[<0>] __buffer_migrate_page+0x154/0x340
[<0>] buffer_migrate_page_norefs+0x14/0x20
[<0>] move_to_new_page+0x8e/0x360
[<0>] migrate_pages+0x3cc/0xfd8
[<0>] compact_zone+0xb70/0x1380
[<0>] kcompactd_do_work+0x15b/0x500
[<0>] kcompactd+0x74/0x340
[<0>] kthread+0x158/0x170
[<0>] ret_from_fork+0x3a/0x50
[<0>] 0xffffffffffffffff

I've also seen khugepaged hung up:

cat /proc/29/stack
[<0>] ___preempt_schedule+0x16/0x18
[<0>] page_vma_mapped_walk+0x60/0x840
[<0>] remove_migration_pte+0x67/0x390
[<0>] rmap_walk_file+0x186/0x380
[<0>] rmap_walk+0xa3/0xd0
[<0>] remove_migration_ptes+0x69/0x70
[<0>] migrate_pages+0xb6d/0xfd8
[<0>] compact_zone+0xb70/0x1370
[<0>] compact_zone_order+0xd8/0x120
[<0>] try_to_compact_pages+0xe5/0x550
[<0>] __alloc_pages_direct_compact+0x6d/0x1a0
[<0>] __alloc_pages_slowpath+0x6c9/0x1640
[<0>] __alloc_pages_nodemask+0x558/0x5b0
[<0>] khugepaged+0x499/0x810
[<0>] kthread+0x158/0x170
[<0>] ret_from_fork+0x3a/0x50
[<0>] 0xffffffffffffffff

Looks like something has gone astray with compact_zone.