Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

From: David Rientjes
Date: Mon Aug 01 2011 - 22:43:49 EST


On Mon, 1 Aug 2011, Pekka Enberg wrote:

> Looking at the data (in slightly reorganized form):
>
> alloc
> =====
>
> 16 threads:
>
>     cache           alloc_fastpath          alloc_slowpath
>     kmalloc-256     4263275 (91.1%)         417445 (8.9%)
>     kmalloc-1024    4636360 (99.1%)         42091 (0.9%)
>     kmalloc-4096    2570312 (54.4%)         2155946 (45.6%)
>
> 160 threads:
>
>     cache           alloc_fastpath          alloc_slowpath
>    kmalloc-256     10937512 (62.8%)        6490753 (37.2%)
>    kmalloc-1024    17121172 (98.3%)        303547 (1.7%)
>    kmalloc-4096    5526281 (31.7%)        11910454 (68.3%)
>
> free
> ====
>
> 16 threads:
>
>     cache           free_fastpath           free_slowpath
>     kmalloc-256     210115 (4.5%)         4470604 (95.5%)
>     kmalloc-1024    3579699 (76.5%)       1098764 (23.5%)
>     kmalloc-4096    67616 (1.4%)        4658678 (98.6%)
>
> 160 threads:
>     cache           free_fastpath           free_slowpath
>     kmalloc-256     15469 (0.1%)       17412798 (99.9%)
>     kmalloc-1024    11604742 (66.6%)        5819973 (33.4%)
>     kmalloc-4096    14848 (0.1%)         17421902 (99.9%)
>
> it's pretty sad to see how SLUB alloc fastpath utilization drops so
> dramatically. Free fastpath utilization isn't all that great with 160
> threads either but it seems to me that most of the performance
> regression compared to SLAB still comes from the alloc paths.
>

It's the opposite, the cumulative effects of the free slowpath is more
costly in terms of latency than the alloc slowpath because it occurs at a
greater frequency; the pattern that I described as "slab thrashing" before
causes a single free to a full slab, manipulation to get it back on the
partial list, then the alloc slowpath grabs it for a single allocation,
and requires another partial slab on the next alloc.