Re: [PATCH RFC 0/2] percpu_ida: Take into account CPU topology when stealing tags

From: Jens Axboe
Date: Tue Apr 22 2014 - 21:26:16 EST


On 2014-04-22 18:53, Ming Lei wrote:
Hi Jens,

On Tue, Apr 22, 2014 at 11:57 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
On 04/22/2014 08:03 AM, Jens Axboe wrote:
On 2014-04-22 01:10, Alexander Gordeev wrote:
On Wed, Mar 26, 2014 at 02:34:22PM +0100, Alexander Gordeev wrote:
But other systems (more dense?) showed increased cache-hit rate
up to 20%, i.e. this one:

Hello Gentlemen,

Any feedback on this?

Sorry for dropping the ball on this. Improvements wrt when to steal, how
much, and from whom are sorely needed in percpu_ida. I'll do a bench
with this on a system that currently falls apart with it.

Ran some quick numbers with three kernels:

stock 3.15-rc2
limit 3.15-rc2 + steal limit patch (attached)

I am thinking/working on this sort of improving too, but my
idea is to compute tags->nr_max_cache by below:

nr_tags / hctx->max_nr_ctx

hctx->max_nr_ctx means the max sw queues mapped to the
hw queue, which need to be introduced in the approach, actually,
the value should represent the CPU topology info.

It is a bit complicated to compute hctx->max_nr_ctx because
we need to take account into CPU hotplug and probable
user-defined mapping callback.

We can always just update the caching info, that's not a big problem. We update the mappings on those events anyway.

If user-defined mapping callback needn't to be considered, the
hctx->max_nr_ctx can be figured out before mapping sw
queue in blk_mq_init_queue() by supposing each CPU is
online first, once it is done, the map for offline CPU is cleared,
then start to call blk_mq_map_swqueue().

I don't see how a user defined mapping would change things a whole lot. It's just another point of updating the cache. Besides, user defined mappings will be mostly (only?) for things like multiqueue, where the caching info would likely remain static over a reconfigure.

In my null_blk test on a quad core SMP VM:

- 4 hw queue
- timer mode

With the above approach, tag allocation from local CPU can be
improved from:

5% -> 50% for boot CPU
30% -> 90% for non-boot CPU.

If no one objects the idea, I'd like to post a patch for review.

Sent it out, that can't hurt. I'll take a look at it, and give it a test spin as well.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/