[PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.

From: zhangchun
Date: Mon Jul 29 2024 - 09:59:35 EST


CPU 0: CPU 1:
kmap_high(){ kmap_xxx() {
... irq_disable();
spin_lock(&kmap_lock)
...
map_new_virtual ...
flush_all_zero_pkmaps
flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
smp_call_function_many spin_lock(&kmap_lock)
... ....
spin_unlock(&kmap_lock)
...

CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 has
disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix this by
releasing kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
deadlock.

if (need_flush) {
unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
lock_kmap();
}

Dropping the lock like this is safe. kmap_lock is used to protect
pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)),
flush_tlb_kernel_range will neither modify nor read these variables.
Leave that data unprotected here is safe.

map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr].
When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is
not dropped. "if (!pkmap_count[last_pkmap_nr])" determine
pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.

Furthermore, the value of static variable last_pkmap_nr is stored in
a local variable last_pkmap_nr, when kmap_lock is acquired, this is
thread-safe.

In an extreme case, if Thread A and Thread B access the same last_pkmap_nr,
Thread A calls function flush_tlb_kernel_range and release the kmap_lock,
and Thread B then acquires the kmap_lock and modifies the variable
pkmap_count[last_pkmap_nr]. After Thread A completes the execution
of function the variable pkmap_count[last_pkmap_nr]. After Thread A
completes the execution of function flush_tlb_kernel_range, it will
check the variable pkmap_count[last_pkmap_nr].

static inline unsigned long map_new_virtual(struct page *page)
{
unsigned long vaddr;
int count;
unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
unsigned int color = get_pkmap_color(page);

start:
...
flush_all_zero_pkmaps();// release kmap_lock, then acquire it
count = get_pkmap_entries_count(color);
}
...
if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
break; /* Found a usable entry */
if (--count)
continue;

...
vaddr = PKMAP_ADDR(last_pkmap_nr);
set_pte_at(&init_mm, vaddr,
&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));

pkmap_count[last_pkmap_nr] = 1;
...
return vaddr;
}

Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@xxxxxxx>
Co-developed-by: zhangzhansheng <zhang.zhansheng@xxxxxxx>
Signed-off-by: zhangzhansheng <zhang.zhansheng@xxxxxxx>
Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Reviewed-by: zhangzhengming <zhang.zhengming@xxxxxxx>
---
mm/highmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/highmem.c b/mm/highmem.c
index ef3189b..07f2c67 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -231,8 +231,18 @@ static void flush_all_zero_pkmaps(void)
set_page_address(page, NULL);
need_flush = 1;
}
- if (need_flush)
+ if (need_flush) {
+ /*
+ * In multi-core system one CPU holds the kmap_lock, waiting
+ * for other CPUs respond to IPI. But other CPUS has disabled
+ * irqs, waiting for kmap_lock, cannot answer the IPI. Release
+ * kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
+ * deadlock.
+ */
+ unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ lock_kmap();
+ }
}

void __kmap_flush_unused(void)
--
1.8.3.1