Re: [PATCH 0/8] IO queuing and complete affinity
From: Alan D. Brunelle
Date: Thu Feb 07 2008 - 10:17:24 EST
Jens Axboe wrote:
> Hi,
>
> Since I'll be on vacation next week, I thought I'd send this out in
> case people wanted to play with it. It works here, but I haven't done
> any performance numbers at all.
>
> Patches 1-7 are all preparation patches for #8, which contains the
> real changes. I'm not particularly happy with the arch implementation
> for raising a softirq on another CPU, but it should be fast enough
> so suffice for testing.
>
> Anyway, this patchset is mainly meant as a playground for testing IO
> affinity. It allows you to set three values per queue, see the files
> in the /sys/block/<dev>/queue directory:
>
> completion_affinity
> Only allow completions to happen on the defined CPU mask.
> queue_affinity
> Only allow queuing to happen on the defined CPU mask.
> rq_affinity
> Always complete a request on the same CPU that queued it.
>
> As you can tell, there's some overlap to allow for experimentation.
> rq_affinity will override completion_affinity, so it's possible to
> have completions on a CPU that isn't set in that mask. The interface
> is currently limited to all CPUs or a specific CPU, but the implementation
> is supports (and works with) cpu masks. The logic is in
> blk_queue_set_cpumask(), it should be easy enough to change this to
> echo a full mask, or allow OR'ing of CPU masks when a new CPU is passed in.
> For now, echo a CPU number to set that CPU, or use -1 to set all CPUs.
> The default is all CPUs for no change in behaviour.
>
> Patch set is against current git as of this morning. The code is also in
> the block git repo, branch is io-cpu-affinity.
>
> git://git.kernel.dk/linux-2.6-block.git io-cpu-affinity
>
FYI: on a kernel with this patch set, running on a 4-way ia64 (non-NUMA) w/ a FC disk, I crafted a test with 135 combinations:
o Having the issuing application pegged on each CPU - or - left alone (run on any CPU), yields 5 possibilities
o Having the queue affinity on each CPU, or any (-1), yields 5 possibilities
o Having the completion affinity on each CPU, or any (-1), yields 5 possibilities
and
o Having the issuing application pegged on each CPU - or - left alone (run on ay CPU), yields 5 possibilities
o Having rq_affinity set to 0 or 1, yields 2 possibilities.
Each test was for 10 minutes, and ran overnight just fine. The difference amongst the 135 resulting values (based upon latency per-IO seen at the application layer) was <<1% (0.32% to be exact). This would seem to indicate that there isn't a penalty for running with this code, and it seems relatively stable given this.
The application used was doing 64KiB asynchronous direct reads, and had a minimum average per-IO latency of 42.426310 milliseconds, and average of 42.486557 milliseconds (std dev of 0.0041561), and a max of 42.561360 milliseconds
I'm going to do some runs on a 16-way NUMA box, w/ a lot of disks today, to see if we see gains in that environment.
Alan D. Brunelle
HP OSLO S&P
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/