Re: [PATCH 0/4] Introduce QPW for per-cpu operations

From: Vlastimil Babka

Date: Mon Feb 23 2026 - 13:13:00 EST

On 2/20/26 17:55, Marcelo Tosatti wrote:
>
> #include <linux/module.h>
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/timex.h>
> #include <linux/preempt.h>
> #include <linux/irqflags.h>
> #include <linux/vmalloc.h>
>
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Gemini AI");
> MODULE_DESCRIPTION("A simple kmalloc performance benchmark");
>
> static int size = 64; // Default allocation size in bytes
> module_param(size, int, 0644);
>
> static int iterations = 1000000; // Default number of iterations
> module_param(iterations, int, 0644);
>
> static int __init kmalloc_bench_init(void) {
> void **ptrs;
> cycles_t start, end;
> uint64_t total_cycles;
> int i;
> pr_info("kmalloc_bench: Starting test (size=%d, iterations=%d)\n", size, iterations);
>
> // Allocate an array to store pointers to avoid immediate kfree-reuse optimization
> ptrs = vmalloc(sizeof(void *) * iterations);
> if (!ptrs) {
> pr_err("kmalloc_bench: Failed to allocate pointer array\n");
> return -ENOMEM;
> }
>
> preempt_disable();
> start = get_cycles();
>
> for (i = 0; i < iterations; i++) {
> ptrs[i] = kmalloc(size, GFP_ATOMIC);
> }
>
> end = get_cycles();
>
> total_cycles = end - start;
> preempt_enable();

While preempt_disable() simplifies things, it can misrepresent the cost of
preempt_disable() that's part of the locking - that will become nested and
then the nested preempt_disable() is typically cheaper, etc.

Also the way it kmallocs all iterations and then kfree all iterations may
skew the probabilities of fastpaths, cache hotness etc.

When introducing sheaves I had a similar microbenchmark, but there was
different amounts of inner-loop iteraions, no outer preempt_disable(), and
linear vs randomized array. See:

https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/commit/?h=slub-percpu-sheaves-v6-benchmarking&id=04028eeffba18a4f821a7194bc9d14f7488bd7d9

(at this point the SLUB_HAS_SHEAVES parts should be removed and the
kmem_cache_print_stats() stuff also shouldn't be interesting for QPW
evaluation).

>
> pr_info("kmalloc_bench: Total cycles for %d allocs: %llu\n", iterations, total_cycles);
> pr_info("kmalloc_bench: Avg cycles per kmalloc: %llu\n", total_cycles / iterations);
>
> // Cleanup
> for (i = 0; i < iterations; i++) {
> kfree(ptrs[i]);
> }
> vfree(ptrs);
>
> return 0;
> }
>
> static void __exit kmalloc_bench_exit(void) {
> pr_info("kmalloc_bench: Module unloaded\n");
> }
>
>