Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment > 1

From: David Hildenbrand
Date: Fri Sep 17 2021 - 04:48:01 EST


This patch looks like a KASAN specific to me. So i would like to avoid
such changes to
the vmalloc internals in order to simplify further maintenance and
keeping it generic
instead.

There is nothing KASAN specific in there :) It's specific to exact
applications -- and KASAN may be one of the only users for now.

Well, you can name it either way :) So it should not be specific by the
design, otherwise as i mentioned the allocator would be like a peace of
code that handles corner cases what is actually not acceptable.

Let's not overstress the situation of adding essentially 3 LOC in order to fix a sane use case of the vmalloc allocator that was not considered properly by internal changes due to 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation").



Currently the find_vmap_lowest_match() adjusts the search size
criteria for any alignment
values even for PAGE_SIZE alignment. That is not optimal. Because a
PAGE_SIZE or one
page is a minimum granularity we operate on vmalloc allocations thus
all free blocks are
page aligned.

All that means that we should adjust the search length only if an
alignment request is bigger than
one page, in case of one page, that corresponds to PAGE_SIZE value,
there is no reason in such
extra adjustment because all free->va_start have a page boundary anyway.

Could you please test below patch that is a generic improvement?

I played with the exact approach below (almost exactly the same code in
find_vmap_lowest_match()), and it might make sense as a general improvement
-- if we can guarantee that start+end of ranges are always PAGE-aligned; I
was not able to come to that conclusion...
All free blocks are PAGE aligned that is how it has to be. A vstart also
should be aligned otherwise the __vunmap() will complain about freeing
a bad address:

<snip>
if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
addr))
return;
<snip>

BTW, we should add an extra check to the alloc_vmap_area(), something like
below:

<snip>
if (!PAGE_ALIGNED(ALIGN(vstart, align))) {
WARN_ONCE(1, "vmalloc: vstart or align are not page aligned (0x%lx, 0x%lx)\n",
vstart, align);
return ERR_PTR(-EBUSY);
}
<snip>

to check that passed pair of vstart and align are correct.


Yes we better should.


vmap_init_free_space() shows me that our existing alignment code/checks
might be quite fragile.

It is not important to page align a first address. As i mentioned before
vstart and align have to be correct and we should add such check.


But I mainly decided to go with my patch instead because it will also work
for exact allocations with align > PAGE_SIZE: most notably, when we try
population of hugepages instead of base pages in __vmalloc_node_range(), by
increasing the alignment. If the exact range allows for using huge pages,
placing huge pages will work just fine with my modifications, while it won't
with your approach.

Long story short: my approach handles exact allocations also for bigger
alignment, Your optimization makes sense as a general improvement for any
vmalloc allocations.

Thanks for having a look!

The problem is that there are no users(seems only KASAN) who set the range
that corresponds to exact size. If you add an aligment overhead on top of

So there is one user and it was broken by 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation").

it means that search size should include it because we may end up with exact
free block and after applying aligment it will not fit. This is how allocators
handle aligment.

This is an implementation detail of the vmalloc allocator ever since 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation").

If callers pass an exact range, and the alignment they specify applies, why should we reject such an allocation? It's leaking an implementation detail fixed easily internally into callers.


Another approach is(you mentioned it in your commit message):

urezki@pc638:~/data/raid0/coding/linux-next.git$ git diff mm/kasan/shadow.c
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..01d3bd76c851 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -200,7 +200,7 @@ static int __meminit kasan_mem_notifier(struct notifier_block *nb,
if (shadow_mapped(shadow_start))
return NOTIFY_OK;
- ret = __vmalloc_node_range(shadow_size, PAGE_SIZE, shadow_start,
+ ret = __vmalloc_node_range(shadow_size, 1, shadow_start,
shadow_end, GFP_KERNEL,
PAGE_KERNEL, VM_NO_GUARD,
pfn_to_nid(mem_data->start_pfn),
urezki@pc638:~/data/raid0/coding/linux-next.git$

why not? Also you can increase the range in KASAN code.

No, that's leaking implementation details to the caller. And no, increasing the range and eventually allocating something bigger (e.g., placing a huge page where it might not have been possible) is not acceptable for KASAN.

If you're terribly unhappy with this patch, please suggest something reasonable to fix exact allocations:
a) Fixes the KASAN use case.
b) Allows for automatic placement of huge pages for exact allocations.
c) Doesn't leak implementation details into the caller.

Thanks

--
Thanks,

David / dhildenb