Re: [RFC PATCH 0/2] Avoid booting stall caused by idmap_kpti_install_ng_mappings

From: Marc Zyngier
Date: Wed Jan 20 2021 - 07:07:13 EST


Hi Justin,

On 2021-01-20 04:51, Justin He wrote:
Hi,
Kindly ping 😊

-----Original Message-----
From: Jia He <justin.he@xxxxxxx>
Sent: Wednesday, January 13, 2021 9:41 AM
To: Catalin Marinas <Catalin.Marinas@xxxxxxx>; Will Deacon
<will@xxxxxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-
kernel@xxxxxxxxxxxxxxx
Cc: Anshuman Khandual <Anshuman.Khandual@xxxxxxx>; Suzuki Poulose
<Suzuki.Poulose@xxxxxxx>; Justin He <Justin.He@xxxxxxx>; Mark Rutland
<Mark.Rutland@xxxxxxx>; Gustavo A. R. Silva <gustavoars@xxxxxxxxxx>;
Richard Henderson <richard.henderson@xxxxxxxxxx>; Dave P Martin
<Dave.Martin@xxxxxxx>; Steven Price <Steven.Price@xxxxxxx>; Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx>; Mike Rapoport <rppt@xxxxxxxxxx>; Ard
Biesheuvel <ardb@xxxxxxxxxx>; Gavin Shan <gshan@xxxxxxxxxx>; Kefeng Wang
<wangkefeng.wang@xxxxxxxxxx>; Mark Brown <broonie@xxxxxxxxxx>; Marc Zyngier
<maz@xxxxxxxxxx>; Cristian Marussi <Cristian.Marussi@xxxxxxx>
Subject: [RFC PATCH 0/2] Avoid booting stall caused by

There is a 10s stall in idmap_kpti_install_ng_mappings when kernel boots
on a Ampere EMAG server.

Commit f992b4dfd58b ("arm64: kpti: Add ->enable callback to remap
swapper using nG mappings") updates the nG bit runtime if kpti is
required.

But things get worse if rodata=full in map_mem(). NO_BLOCK_MAPPINGS |
NO_CONT_MAPPINGS is required when creating pagetable mapping. Hence all
ptes are fully mapped in this case. On a Ampere EMAG server with 256G
memory(pagesize=4k), it causes the 10s stall.

After moving init_cpu_features() ahead of early_fixmap_init(), we can use
cpu_have_const_cap earlier than before. Hence we can avoid this stall
by updating arm64_use_ng_mappings.

After this patch series, it reduces the kernel boot time from 14.7s to
4.1s:
Before:
[ 14.757569] Freeing initrd memory: 60752K
After:
[ 4.138819] Freeing initrd memory: 60752K

Set it as RFC because I want to resolve any other points which I have
misconerned.

But you don't really explain *why* having the CPU Feature discovery
early helps at all. Is that so that you can bypass the idmap mapping?
I'd expect something that explain the problem instead of paraphrasing
the patches.

Another thing is whether you have tested this on some ThunderX HW
(the first version, not TX2), as this is the whole reason for this
code...

Thanks,

M.
--
Jazz is not dead. It just smells funny...