Re: [PATCH v3 1/3] mm/slub: enable debugging memory wasting of kmalloc
From: Vlastimil Babka
Date: Wed Jul 27 2022 - 10:12:52 EST
On 7/27/22 12:20, Christoph Lameter wrote:
> On Wed, 27 Jul 2022, Feng Tang wrote:
>
>> @@ -2905,7 +2950,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
>> * already disabled (which is the case for bulk allocation).
>> */
>> static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> - unsigned long addr, struct kmem_cache_cpu *c)
>> + unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
>> {
>> void *freelist;
>> struct slab *slab;
>> @@ -3102,7 +3147,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> * pointer.
>> */
>> static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> - unsigned long addr, struct kmem_cache_cpu *c)
>> + unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
>> {
>> void *p;
>>
>> @@ -3115,7 +3160,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> c = slub_get_cpu_ptr(s->cpu_slab);
>> #endif
>>
>> - p = ___slab_alloc(s, gfpflags, node, addr, c);
>> + p = ___slab_alloc(s, gfpflags, node, addr, c, orig_size);
>> #ifdef CONFIG_PREEMPT_COUNT
>> slub_put_cpu_ptr(s->cpu_slab);
>
> This is modifying and making execution of standard slab functions more
> expensive. Could you restrict modifications to the kmalloc subsystem?
>
> kmem_cache_alloc() and friends are not doing any rounding up to power of
> two sizes.
>
> What is happening here is that you pass kmalloc object size info through
> the kmem_cache_alloc functions so that the regular allocation functions
> debug functionality can then save the kmalloc specific object request
> size. This is active even when no debugging options are enabled.
I don't think the extra orig_size parameter (unused for non-debug caches)
adds any noticeable overhead. In slab_alloc_node() we already have the
orig_size parameter (for both kmalloc and non-kmalloc caches) before this
patch, and it remains unused in the cmpxchg based fast path. The patch adds
it to __slab_alloc() which is not the fast path, and it's still unused for
non-debug caches there. So the overhead is basically one less register
available (because of the extra param) in a slow path and that should be
immeasurable.
> Can you avoid that? Have kmalloc do the object allocation without passing
> through the kmalloc request size and then add the original size info
> to the debug field later after execution continues in the kmalloc functions?
That approach is problematic wrt patches 2+3 if we want to use orig_size to
affect the boundaries of zero-init and redzoning.
Also it goes against the attempt to fix races wrt validation, see [1] where
the idea is to have alloc_debug_processing() including redzoning done under
n->list_lock and for that should have orig_size passed there as well.
[1] https://lore.kernel.org/all/69462916-2d1c-dd50-2e64-b31c2b61690e@xxxxxxx/