[Question] race during kasan_populate_vmalloc_pte

From: Wupeng Ma
Date: Tue Jun 18 2024 - 02:40:46 EST


Hi maintainers,

During our testing, we discovered that kasan vmalloc may trigger a false
vmalloc-out-of-bounds warning due to a race between kasan_populate_vmalloc_pte
and kasan_depopulate_vmalloc_pte.

cpu0 cpu1 cpu2
kasan_populate_vmalloc_pte kasan_populate_vmalloc_pte kasan_depopulate_vmalloc_pte
spin_unlock(&init_mm.page_table_lock);
pte_none(ptep_get(ptep))
// pte is valid here, return here
pte_clear(&init_mm, addr, ptep);
pte_none(ptep_get(ptep))
// pte is none here try alloc new pages
spin_lock(&init_mm.page_table_lock);
kasan_poison
// memset kasan shadow region to 0
page = __get_free_page(GFP_KERNEL);
__memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE);
pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
spin_lock(&init_mm.page_table_lock);
set_pte_at(&init_mm, addr, ptep, pte);
spin_unlock(&init_mm.page_table_lock);


Since kasan shadow memory in cpu0 is set to 0xf0 which means it is not
initialized after the race in cpu1. Consequently, a false vmalloc-out-of-bounds
warning is triggered when a user attempts to access this memory region.

The root cause of this problem is the pte valid check at the start of
kasan_populate_vmalloc_pte should be removed since it is not protected by
page_table_lock. However, this may result in severe performance degradation
since pages will be frequently allocated and freed.

Is there have any thoughts on how to solve this issue?

Thank you.