Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support

From: Dev Jain

Date: Thu Apr 23 2026 - 02:13:53 EST




On 22/04/26 8:08 pm, Ryan Roberts wrote:
> On 22/04/2026 15:23, Dev Jain wrote:
>>
>>
>> On 22/04/26 6:51 pm, Ryan Roberts wrote:
>>> On 24/03/2026 13:26, Muhammad Usama Anjum wrote:
>>>> For allocations that will be accessed only with match-all pointers
>>>> (e.g., kernel stacks), setting tags is wasted work. If the caller
>>>> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
>>>> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
>>>> returns early without tagging.
>>>>
>>>> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
>>>> APIs. So it wasn't being checked. Now its being checked and acted
>>>> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
>>>> defined there.
>>>>
>>>> This is a preparatory patch for optimizing kernel stack allocations.
>>>>
>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@xxxxxxx>
>>>> ---
>>>> Changes since v1:
>>>> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>>>> is zero in non-hw-tags mode.
>>>> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
>>>> ---
>>>> mm/vmalloc.c | 11 ++++++++---
>>>> 1 file changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>>> index c607307c657a6..69ae205effb46 100644
>>>> --- a/mm/vmalloc.c
>>>> +++ b/mm/vmalloc.c
>>>> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>>> __GFP_NOFAIL | __GFP_ZERO |\
>>>> __GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>>>> GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
>>>> - GFP_USER | __GFP_NOLOCKDEP)
>>>> + GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>>>>
>>>> static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>> {
>>>> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>> *
>>>> * %__GFP_NOWARN can be used to suppress failure messages.
>>>> *
>>>> + * %__GFP_SKIP_KASAN can be used to skip poisoning
>>>
>>> You mean skip *un*poisoning, I think? But you would only want this to apply to
>>> the actaul pages mapped by vmalloc. You wouldn't want to skip unpoisoning for
>>> any allocated meta data; I think that is currently possible since the gfp_flags
>>> that are passed into __vmalloc_node_range_noprof() are passed down to
>>> __get_vm_area_node() unmdified. You probably want to explicitly ensure
>>> __GFP_SKIP_KASAN is clear for that internal call?
>>>
>>>> + *
>>>> * Can not be called from interrupt nor NMI contexts.
>>>> * Return: the address of the area or %NULL on failure
>>>> */
>>>> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>>> * kasan_unpoison_vmalloc().
>>>> */
>>>> if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>>>> - if (kasan_hw_tags_enabled()) {
>>>> + bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>>>> +
>>>> + if (kasan_hw_tags_enabled() && !skip_kasan) {
>>>> /*
>>>> * Modify protection bits to allow tagging.
>>>> * This must be done before mapping.
>>>> @@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>>> }
>>>>
>>>> /* Take note that the mapping is PAGE_KERNEL. */
>>>> - kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>>>> + if (!skip_kasan)
>>>> + kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>>>
>>> It's pretty ugly to use the absence of this flag to rely on
>>> kasan_unpoison_vmalloc() not unpoisoning. Perhaps it is preferable to just not
>>> call kasan_unpoison_vmalloc() for the skip_kasan case?
>>>
>>>> }
>>>>
>>>> /* Allocate physical pages and map them into vmalloc space. */
>>>
>>> Perhaps something like this would work:
>>>
>>> ---8<---
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index c31a8615a8328..c340db141df57 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -3979,6 +3979,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>> * under moderate memory pressure.
>>> *
>>> * %__GFP_NOWARN can be used to suppress failure messages.
>>> +
>>> + * %__GFP_SKIP_KASAN skip unpoisoning of mapped pages (when prot=PAGE_KERNEL).
>>> *
>>> * Can not be called from interrupt nor NMI contexts.
>>> * Return: the address of the area or %NULL on failure
>>> @@ -3993,6 +3995,9 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>> kasan_vmalloc_flags_t kasan_flags = KASAN_VMALLOC_NONE;
>>> unsigned long original_align = align;
>>> unsigned int shift = PAGE_SHIFT;
>>> + bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>>> +
>>> + gfp_mask &= ~__GFP_SKIP_KASAN;
>>
>> Okay so this is so that metadata allocation can keep using normal
>> page allocator side unpoisoning.
>
> Yes.
>
>>
>>> if (WARN_ON_ONCE(!size))
>>> return NULL;
>>> @@ -4041,7 +4046,7 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>> * kasan_unpoison_vmalloc().
>>> */
>>> if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>>> - if (kasan_hw_tags_enabled()) {
>>> + if (kasan_hw_tags_enabled() && !skip_kasan) {
>>
>> Why do we want to elide GFP_SKIP_ZERO (set below) in this case?
>
> You mean why do we want to skip initializing the allocated memory to zero for
> the case where kasan HW_TAGS is enabled and we are not skipping kasan unpoisoning?
>
> Because setting tags at the same time as zeroing the memory is less expensive
> than doing them both as separate operations. So we tell page_alloc not to bother
> zeroing the memory and kasan_unpoison_vmalloc() does it at the same time as
> setting the tags instead. See kasan_unpoison() which ultimately calls
> mte_set_mem_tag_range().

I was asking the opposite question. So in the case of skip_kasan, we also want
to skip setting GFP_SKIP_ZERO, because we are not reliant on kasan hw tags path
to zero the memory, we are relying on page allocator now. Got it.

>
>>
>>> /*
>>> * Modify protection bits to allow tagging.
>>> * This must be done before mapping.
>>> @@ -4054,6 +4059,12 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>> * poisoned and zeroed by kasan_unpoison_vmalloc().
>>> */
>>> gfp_mask |= __GFP_SKIP_KASAN | __GFP_SKIP_ZERO;
>>> + } else if (skip_kasan) {
>>> + /*
>>> + * Skip page_alloc unpoisoning physical pages backing
>>> + * VM_ALLOC mapping, as requested by caller.
>>> + */
>>> + gfp_mask |= __GFP_SKIP_KASAN;
>>> }
>>> /* Take note that the mapping is PAGE_KERNEL. */
>>> @@ -4078,7 +4089,8 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>> (gfp_mask & __GFP_SKIP_ZERO))
>>> kasan_flags |= KASAN_VMALLOC_INIT;
>>> /* KASAN_VMALLOC_PROT_NORMAL already set if required. */
>>> - area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
>>> + if (!skip_kasan)
>>> + area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
>>
>> I really think we should do some decoupling here - GFP_SKIP_KASAN means,
>> "skip KASAN when going through page allocator". > Now we reuse this flag
>> to skip vmalloc unpoisoning.
>>
>> Some code path using GFP_SKIP_KASAN (which is highly likely given that
>> GFP_HIGHUSER_MOVABLE has this) and also using vmalloc() will unintentionally
>> also skip vmalloc unpoisoning.
>
> If a caller wants to vmalloc() memory with GFP_HIGHUSER_MOVABLE (which seems
> HIGHLY suspect to me) then surely leaving the memory poisoned is *exactly* what
> they expect?

Okay I get your point.
>
>>
>> I think we are doing patch 1 because of patch 2 - so in patch 2, perhaps
>> instead of calling __vmalloc_node we can call __vmalloc_node_range_noprof and
>> shift this "skip vmalloc unpoisoning" functionality into vmalloc flags instead?
>
> This is exactly how Usama was doing it in v1. I suggested we should just reuse
> the existing flag since it already provides the semantic we want and is less
> confusing than introducing a new flag.
>
> I know David is keen to do a wider rework and remove/rename/change the semantics
> of __GFP_SKIP_KASAN, but I'm hoping that if we just continue to use the existing
> flag and its semantics for vmalloc then there is no reason why this series can't
> be merged independently of that wider rework.

Okay makes sense.

>
> Thanks,
> Ryan
>
>
>> Perhaps this won't work for the nommu case (__vmalloc_node has two definitions),
>> just a line of thought.
>>
>>
>>> /*
>>> * In this function, newly allocated vm_struct has VM_UNINITIALIZED
>>>
>>> ---8<---
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>
>