Re: hackbench regression since 2.6.25-rc

From: Zhang, Yanmin
Date: Mon Mar 17 2008 - 23:31:11 EST


On Mon, 2008-03-17 at 10:32 -0700, Christoph Lameter wrote:
> On Mon, 17 Mar 2008, Zhang, Yanmin wrote:
>
> > slub_min_objects | 8 | 16 | 32 | 64
> > --------------------------------------------------------------------------------------------
> > slab(__slab_alloc+__slab_free+add_partial) cpu utilization | 88.00% | 44.00% | 13.00% | 12%
> >
> >
> > When slub_min_objects=32, we could get a reasonable value. Beyond 32, the improvement
> > is very small. 32 is just possible_cpu_number*2 on my tigerton.
>
> Interesting. What is the optimal configuration for your 8p? Could you
> figure out the optimal configuration for an 4p and a 2p configuration?
I used 8-core stoakley to do testing, and tried boot kernel with maxcpus=4 and 2.

Just ran ./hackbench 100 process 2000.

processor number\slub_min_objects | slub_min_objects=8 | 16 | 32 | 64
--------------------------------------------------------------------------------------------
8p | 60second | 30 | 28.5 | 26.5
--------------------------------------------------------------------------------------------
4p | 50second | 43 | 42 |
--------------------------------------------------------------------------------------------
2p | 92second | 79 | |


As stoakley is just multi-core machine and hasn't hyper-threading, I also tested it on an old
harwich machine which has 4 physical processors and 8 logical processors with hyperthreading.
processor number\slub_min_objects | slub_min_objects=8 | 16 | 32 | 64
--------------------------------------------------------------------------------------------
8p | 78.7second | 77.5| |


>
> > It's hard to say hackbench simulates real applications closely. But it discloses a possible
> > performance bottlebeck. Last year, we once captured the kmalloc-2048 issue by tbench. So the
> > default slub_min_objects need to be revised. In the other hand, slab is allocated by alloc_page
> > when its size is equal to or more than a half page, so enlarging slub_min_objects won't create
> > too many slab page buffers.
> >
> > As for NUMA, perhaps we could define slub_min_objects to 2*max_cpu_number_per_node.
>
> Well for a 4k cpu configu this would set min_objects to 8192.
> So I think
> we could implement a form of logarithmic scaling based on cpu
> counts comparable to what is done for the statistics update in vmstat.c
>
> fls(num_online_cpus()) = 4
num_online_cpus as the input parameter is ok. A potential issue is how to consider cpu hot-plug.

When num_online_cpus()=16, fls(num_online_cpus())=5.

>
> So maybe
>
> slub_min_objects= 8 + (2 + fls(num_online_cpus())) * 4
So slub_min_objects= 8 + (1 + fls(num_online_cpus())) * 4.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/