Re: 5.1 kernel: khugepaged stuck at 100%

From: Max Kellermann
Date: Thu Jun 06 2019 - 13:34:41 EST


On 2019/05/16 16:14, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
> Kernel: 5.1.2
>
> $ sudo cat /proc/$(pidof khugepaged)/stack
> [<0>] 0xffffffffffffffff
>
> $ perf top
>
> PerfTop: 3716 irqs/sec kernel:92.9% exact: 99.1% lost: 68/68
> drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
> -------------------------------------------------------------------------------
>
> 47.53% [kernel] [k] compaction_alloc
> 38.88% [kernel] [k] __pageblock_pfn_to_page
> 6.68% [kernel] [k] nmi
> 0.58% [kernel] [k] __list_del_entry_valid
> 0.48% [kernel] [k] format_decode
> 0.39% [kernel] [k] __rb_insert_augmented
> 0.25% libdbus-1.so.3.19.9 [.] _dbus_string_hex_decode
> 0.24% [kernel] [k] entry_SYSCALL_64_after_hwframe
> 0.20% perf [.] rb_next
> 0.19% perf [.] __symbols__insert

I have the same problem (kernel 5.1.7), but over here, it's a PHP
process, not khugepaged, which is looping inside compaction_alloc.

This is from "perf report":

100.00% 0.00% php-cgi7.0 php-cgi7.0 [.] 0x000055d0e88bc5ee
|
---0x55d0e88bc5ee
page_fault
__do_page_fault
handle_mm_fault
__handle_mm_fault
do_huge_pmd_anonymous_page
__alloc_pages_nodemask
__alloc_pages_slowpath
__alloc_pages_direct_compact
try_to_compact_pages
compact_zone_order
compact_zone
migrate_pages
compaction_alloc
|
|--24.43%--__pageblock_pfn_to_page
|
--2.22%--_cond_resched
|
--0.89%--rcu_all_qs

ftrace:

<...>-263514 [012] .... 109004.793009: rcu_all_qs <-_cond_resched
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: _cond_resched <-compaction_alloc

(Repeating this sequence over and over.)

Nothing useful in /proc/263514/{stack,wchan,syscall}.

What else can I do to collect more information to aid fix this?

Max