Re: [RFC PATCH] mm/slab: Avoid build bug for calls to kmalloc with a large constant

From: Ryan Roberts
Date: Tue Nov 26 2024 - 10:33:24 EST


On 26/11/2024 15:27, Vlastimil Babka wrote:
> On 11/26/24 16:09, Vlastimil Babka wrote:
>> On 11/26/24 15:53, Ryan Roberts wrote:
>>> On 26/11/2024 12:36, Vlastimil Babka wrote:
>>>> On 11/26/24 13:18, Ryan Roberts wrote:
>>>>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>>>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>>>>> is no longer optimized out with a constant size, so a build bug may
>>>>>>> occur on a path that won't be reached.
>>>>>>
>>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>>>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>>>>> would be better to keep it that way.
>>>>>>
>>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>>>>
>>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>>>>> (if not, please report it :)
>>>>>
>>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>>>>> whereas before it only supported up to 8K. I was trying to avoid that since I
>>>>> assumed that would be costly in terms of extra memory allocated for those higher
>>>>> order buckets that will never be used. But I have no idea how SLUB works in
>>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>>>>> an issue in practice?
>>>>
>>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
>>>> some overhead with the management structures (struct kmem_cache etc) but
>>>> much smaller.
>>>> To be completely honest, some extra overhead might come to be when the slabs
>>>> are allocated ans later the user frees those allocations. kmalloc_large()
>>>> wwould return them immediately, while a regular kmem_cache will keep one or
>>>> more per cpu for reuse. But if that becomes a visible problem we can tune
>>>> those caches to discard slabs more aggressively.
>>>
>>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel
>>> I have a slightly better understanding:
>>>
>>> void *kmalloc_noprof(size_t size, gfp_t flags)
>>> {
>>> if (__builtin_constant_p(size) && size) {
>>>
>>> if (size > KMALLOC_MAX_CACHE_SIZE)
>>> return __kmalloc_large_noprof(size, flags); <<< (1)
>>>
>>> index = kmalloc_index(size);
>>> return __kmalloc_cache_noprof(...); <<< (2)
>>> }
>>> return __kmalloc_noprof(size, flags); <<< (3)
>>> }
>>>
>>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
>>> resolving either to a call to (1) or (2), decided at compile time. If
>>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
>>> need to be kept in the function.
>>>
>>> But intuatively, I would have guessed that given the choice between the overhead
>>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra
>>> sizes between 16K and 128K, then the runtime conditional would be preferable. I
>>> would guess that quite a bit of memory could get tied up in those caches?
>>>
>>> Why is your preference the opposite? What am I not understanding?
>>
>> +CC more slab people.
>>
>> So the above is an inline function, but constructed in a way that it should,
>> without further inline code, become
>> - a call to __kmalloc_large_noprof() for build-time constant size larger
>> than KMALLOC_MAX_CACHE_SIZE
>> - a call to __kmalloc_cache_noprof() for build-time constant size smaller
>> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with
>> compile-time calculated index
>> - call to __kmalloc_noprof() for non-constant sizes otherwise
>>
>> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way
>> to handle it would be to #ifdef or otherwise compile out away the whole "if
>> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so
>> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading
>> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof().
>
> Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX
> behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and
> additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where
> build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a
> compile-time test) is redirected to __kmalloc_noprof() for a run-time test.
>
> That seems like the optimum solution :)

Yes; that feels like the better approach to me. I'll implement this by default
unless anyone else objects.

>
>> I just don't believe we would waste so much memory with caches the extra
>> sizes for sizes between 16K and 128K, so would do that suggestion only if
>> proven wrong. But I wouldn't mind it that much if you chose it right away.
>> The solution earlier in this thread to patch __kmalloc_index() would be
>> worse than either of those two alternatives though.
>
>