Re: [Question] race during kasan_populate_vmalloc_pte

From: mawupeng
Date: Mon Jul 15 2024 - 21:12:47 EST




On 2024/7/16 1:19, Alexander Potapenko wrote:
> On Fri, Jul 12, 2024 at 4:08 AM mawupeng <mawupeng1@xxxxxxxxxx> wrote:
>>
>> Hi maintainers,
>>
>> kingly ping.
>>
>> On 2024/6/18 14:40, Wupeng Ma wrote:
>>> Hi maintainers,
>>>
>>> During our testing, we discovered that kasan vmalloc may trigger a false
>>> vmalloc-out-of-bounds warning due to a race between kasan_populate_vmalloc_pte
>>> and kasan_depopulate_vmalloc_pte.
>>>
>>> cpu0 cpu1 cpu2
>>> kasan_populate_vmalloc_pte kasan_populate_vmalloc_pte kasan_depopulate_vmalloc_pte
>>> spin_unlock(&init_mm.page_table_lock);
>>> pte_none(ptep_get(ptep))
>>> // pte is valid here, return here
>>> pte_clear(&init_mm, addr, ptep);
>>> pte_none(ptep_get(ptep))
>>> // pte is none here try alloc new pages
>>> spin_lock(&init_mm.page_table_lock);
>>> kasan_poison
>>> // memset kasan shadow region to 0
>>> page = __get_free_page(GFP_KERNEL);
>>> __memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE);
>>> pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
>>> spin_lock(&init_mm.page_table_lock);
>>> set_pte_at(&init_mm, addr, ptep, pte);
>>> spin_unlock(&init_mm.page_table_lock);
>>>
>>>
>>> Since kasan shadow memory in cpu0 is set to 0xf0 which means it is not
>>> initialized after the race in cpu1. Consequently, a false vmalloc-out-of-bounds
>>> warning is triggered when a user attempts to access this memory region.
>>>
>>> The root cause of this problem is the pte valid check at the start of
>>> kasan_populate_vmalloc_pte should be removed since it is not protected by
>>> page_table_lock. However, this may result in severe performance degradation
>>> since pages will be frequently allocated and freed.
>>>
>>> Is there have any thoughts on how to solve this issue?
>>>
>>> Thank you.
>
> I am going to take a closer look at this issue. Any chance you have a
> reproducer for it?

So far not good. I am trying to get a reproducer, but there is little progress in it.

>