Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1
From: David Rientjes
Date: Mon Aug 01 2011 - 22:43:49 EST
On Mon, 1 Aug 2011, Pekka Enberg wrote:
> Looking at the data (in slightly reorganized form):
>
> alloc
> =====
>
> 16 threads:
>
> cache alloc_fastpath alloc_slowpath
> kmalloc-256 4263275 (91.1%) 417445 (8.9%)
> kmalloc-1024 4636360 (99.1%) 42091 (0.9%)
> kmalloc-4096 2570312 (54.4%) 2155946 (45.6%)
>
> 160 threads:
>
> cache alloc_fastpath alloc_slowpath
> kmalloc-256 10937512 (62.8%) 6490753 (37.2%)
> kmalloc-1024 17121172 (98.3%) 303547 (1.7%)
> kmalloc-4096 5526281 (31.7%) 11910454 (68.3%)
>
> free
> ====
>
> 16 threads:
>
> cache free_fastpath free_slowpath
> kmalloc-256 210115 (4.5%) 4470604 (95.5%)
> kmalloc-1024 3579699 (76.5%) 1098764 (23.5%)
> kmalloc-4096 67616 (1.4%) 4658678 (98.6%)
>
> 160 threads:
> cache free_fastpath free_slowpath
> kmalloc-256 15469 (0.1%) 17412798 (99.9%)
> kmalloc-1024 11604742 (66.6%) 5819973 (33.4%)
> kmalloc-4096 14848 (0.1%) 17421902 (99.9%)
>
> it's pretty sad to see how SLUB alloc fastpath utilization drops so
> dramatically. Free fastpath utilization isn't all that great with 160
> threads either but it seems to me that most of the performance
> regression compared to SLAB still comes from the alloc paths.
>
It's the opposite, the cumulative effects of the free slowpath is more
costly in terms of latency than the alloc slowpath because it occurs at a
greater frequency; the pattern that I described as "slab thrashing" before
causes a single free to a full slab, manipulation to get it back on the
partial list, then the alloc slowpath grabs it for a single allocation,
and requires another partial slab on the next alloc.