On Wed, 03 Apr 2024 07:49:29 +0100,
Gavin Shan <gshan@xxxxxxxxxx> wrote:
KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
bitmap is collected by VMM and the corresponding PTEs need to be
write-protected again. Unfortunately, the operand passed to the TLBI
RANGE instruction isn't correctly sorted out by commit d1d3aa98b1d4
("arm64: tlb: Use the TLBI RANGE feature in arm64"). It leads to
crash on the destination VM after live migration because some of the
dirty pages are missed.
For example, I have a VM where 8GB memory is assigned, starting from
0x40000000 (1GB). Note that the host has 4KB as the base page size.
All TLBs for VM can be covered by one TLBI RANGE operation. However,
I receives 0xffff708000040000 as the operand, which is wrong and the
correct one should be 0x00007f8000040000. From the wrong operand, we
have 3 and 1 for SCALE (bits[45:44) and NUM (bits943:39], only 1GB
instead of 8GB memory is covered.
Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI
RANGE operand are provided.
Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64")
Cc: stable@xxxxxxxxxx # v5.10+
Reported-by: Yihuang Yu <yihyu@xxxxxxxxxx>
Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
---
arch/arm64/include/asm/tlbflush.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 3b0e8248e1a4..07c4fb4b82b4 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -166,7 +166,7 @@ static inline unsigned long get_trans_granule(void)
*/
#define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
#define __TLBI_RANGE_NUM(pages, scale) \
- ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
+ ((((pages) >> (5 * (scale) + 1)) - 1) & TLBI_RANGE_MASK)
/*
* TLB Invalidation
This looks pretty wrong, by the very definition of the comment that's
just above:
<quote>
/*
* Generate 'num' values from -1 to 30 with -1 rejected by the
* __flush_tlb_range() loop below.
*/
</quote>
With your change, num can't ever be negative, and that breaks
__flush_tlb_range_op():
<quote>
num = __TLBI_RANGE_NUM(pages, scale); \
if (num >= 0) { \
addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
scale, num, tlb_level); \
__tlbi(r##op, addr); \
if (tlbi_user) \
__tlbi_user(r##op, addr); \
start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
pages -= __TLBI_RANGE_PAGES(num, scale); \
} \
scale--; \
</quote>
We'll then shove whatever value we've found in the TLBI operation,
leading to unknown results instead of properly adjusting the scale to
issue a smaller invalidation.
I think the problem is that you are triggering NUM=31 and SCALE=3,
which the current code cannot handle as per the comment above
__flush_tlb_range_op() (we can't do NUM=30 and SCALE=4, obviously).
Can you try the untested patch below?
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 3b0e8248e1a4..b71a1cece802 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -379,10 +379,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
* 3. If there is 1 page remaining, flush it through non-range operations. Range
* operations can only span an even number of pages. We save this for last to
* ensure 64KB start alignment is maintained for the LPA2 case.
- *
- * Note that certain ranges can be represented by either num = 31 and
- * scale or num = 0 and scale + 1. The loop below favours the latter
- * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
*/
#define __flush_tlb_range_op(op, start, pages, stride, \
asid, tlb_level, tlbi_user, lpa2) \
@@ -407,6 +403,7 @@ do { \
\
num = __TLBI_RANGE_NUM(pages, scale); \
if (num >= 0) { \
+ num += 1; \
addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
scale, num, tlb_level); \
__tlbi(r##op, addr); \