Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

From: Qian Cai
Date: Thu Sep 05 2019 - 17:08:05 EST


Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it
will also fix it.

On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote:
> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
>
> Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> CONFIG_PROVE_LOCKING=yÂresults in hanging.
>
> [1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s
> tgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> ...
> [ÂÂ125.142689][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x2
> [ÂÂ125.149687][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [ÂÂ125.165198][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.2.auto: allocated 524288 entries
> for cmdq
> [ÂÂ125.239425][ [ÂÂ125.251484][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.3.auto: option
> mask 0x2
> [ÂÂ125.258233][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.3.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [ÂÂ125.282750][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> for cmdq
> [ÂÂ125.320097][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> for evtq
> [ÂÂ125.332667][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.4.auto: option mask 0x2
> [ÂÂ125.339427][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [ÂÂ125.354846][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> for cmdq
> [ÂÂ125.375295][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> for evtq
> [ÂÂ125.387371][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.5.auto: option mask 0x2
> [ÂÂ125.393955][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.5.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [ÂÂ125.522605][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> for cmdq
> [ÂÂ125.543338][ÂÂÂÂT1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> for evtq
> [ÂÂ126.694742][ÂÂÂÂT1] EFI Variables Facility v0.08 2004-May-17
> [ÂÂ126.799291][ÂÂÂÂT1] NET: Registered protocol family 17
> [ÂÂ126.978632][ÂÂÂÂT1] zswap: loaded using pool lzo/zbud
> [ÂÂ126.989168][ÂÂÂÂT1] kmemleak: Kernel memory leak detector initialized
> [ÂÂ126.989191][ T1577] kmemleak: Automatic memory scanning thread started
> [ÂÂ127.044079][ T1335] pcieport 0000:0f:00.0: Adding to iommu group 0
> [ÂÂ127.388074][ÂÂÂÂT1] Freeing unused kernel memory: 22528K
> [ÂÂ133.527005][ÂÂÂÂT1] Checked W+X mappings: passed, no W+X pages found
> [ÂÂ133.533474][ÂÂÂÂT1] Run /init as init process
> [ÂÂ133.727196][ÂÂÂÂT1] systemd[1]: System time before build time, advancing
> clock.
> [ÂÂ134.576021][ T1587] modprobe (1587) used greatest stack depth: 27056 bytes
> left
> [ÂÂ134.764026][ÂÂÂÂT1] systemd[1]: systemd 239 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT
> +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-
> hierarchy=legacy)
> [ÂÂ134.799044][ÂÂÂÂT1] systemd[1]: Detected architecture arm64.
> [ÂÂ134.804818][ÂÂÂÂT1] systemd[1]: Running in initial RAM disk.
> <...hang...>
>
> Fix it by either set page_alloc.shuffle=0 or CONFIG_PROVE_LOCKING=n which allow
> it to continue successfully.
>
>
> [ÂÂ121.093846][ÂÂÂÂT1] systemd[1]: Set hostname to <hpe-apollo-cn99xx>.
> [ÂÂ123.157524][ÂÂÂÂT1] random: systemd: uninitialized urandom read (16 bytes
> read)
> [ÂÂ123.168562][ÂÂÂÂT1] systemd[1]: Listening on Journal Socket.
> [ÂÂOKÂÂ] Listening on Journal Socket.
> [ÂÂ123.203932][ÂÂÂÂT1] random: systemd: uninitialized urandom read (16 bytes
> read)
> [ÂÂ123.212813][ÂÂÂÂT1] systemd[1]: Listening on udev Kernel Socket.
> [ÂÂOKÂÂ] Listening on udev Kernel Socket.
> ...