Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults

From: Marc Zyngier

Date: Thu Mar 05 2026 - 06:01:03 EST


On Wed, 04 Mar 2026 18:55:04 +0000,
Marc Zyngier <maz@xxxxxxxxxx> wrote:
>
> On Wed, 25 Jun 2025 11:55:48 +0100,
> Quentin Perret <qperret@xxxxxxxxxx> wrote:
> >
> > host_stage2_adjust_range() tries to find the largest block mapping that
> > fits within a memory or mmio region (represented by a kvm_mem_range in
> > this function) during host stage-2 faults under pKVM. To do so, it walks
> > the host stage-2 page-table, finds the faulting PTE and its level, and
> > then progressively increments the level until it finds a granule of the
> > appropriate size. However, the condition in the loop implementing the
> > above is broken as it checks kvm_level_supports_block_mapping() for the
> > next level instead of the current, so pKVM may attempt to map a region
> > larger than can be covered with a single block.
> >
> > This is not a security problem and is quite rare in practice (the
> > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
> > smaller granule), but this is clearly not the expected behaviour.
> >
> > Refactor the loop to fix the bug and improve readability.
> >
> > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
> > Signed-off-by: Quentin Perret <qperret@xxxxxxxxxx>
>
> This patch prevents my O6 board from booting in protected mode as of
> e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again.
>
> I haven't quite worked out why though. The hack below makes it work,
> but implies that we can get ranges that are smaller than a page. That
> feels unlikely, but I'm not sure we can rule it out (the kernel page
> size could be pretty large anyway).

Having spent a bit of time on this, I'm pretty sure this is the cause
of the issue. The memblock tables are as such:

maz@cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory
0: 0x0000000080000000..0x00000000843fffff 0 NOMAP
1: 0x0000000084400000..0x00000000845fffff 0 NONE
2: 0x0000000085000000..0x000000009fffffff 0 NONE
3: 0x00000000a0000000..0x00000000a7ffffff 0 NOMAP
4: 0x00000000a8000000..0x00000000fffbffff 0 NONE
5: 0x00000000fffc0000..0x00000000fffeffff 0 NOMAP
6: 0x00000000ffff0000..0x00000000ffffdfff 0 NONE
7: 0x00000000ffffe000..0x00000000ffffffff 0 NOMAP
8: 0x0000000100000000..0x00000007fe4effff 0 NONE
9: 0x00000007fe4f0000..0x00000007fedeffff 0 NOMAP
10: 0x00000007fedf0000..0x00000007ffffffff 0 NONE
11: 0x0000008000000000..0x000000807a290fff 0 NONE
12: 0x000000807a291000..0x000000807a2927b2 0 NOMAP
13: 0x000000807a2927b3..0x000000807fffffff 0 NONE

Any access to page 0x000000807a292000 is going to blow up in your
face, because there is no way you can map this and still respect the
memblock boundary. Same thing for any region that is smaller than
PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying.

I'm starting to think that my hack is not that idiotic in the end...

M.

--
Without deviation from the norm, progress is not possible.