Re: CFQ timer precision

From: Jens Axboe
Date: Mon Nov 16 2015 - 11:16:16 EST

On 11/16/2015 08:11 AM, Jan Kara wrote:

lately I was looking into a big performance hit we take when blkio
controller is enabled and jbd2 thread ends up in a different cgroup than
user process. E.g. dbench4 throughput drops from ~140 MB/s to ~20 MB/s.
However artificial dbench4 is, this kind of drop will likely be clearly
visible in real life workloads as well. With unified cgroup hierarchy
the above cgroup split between jbd2 and user processes is unavoidable
once you enable blkio controller so IMO we should accomodate that better.

I have couple of CFQ idling improvements / fixes which I'll post later this
week once I'll complete some round of benchmarking. They improve the
throughput to ~40 MB/s which helps but clearly there's still a big room for
improvement. The reason for the performance drop is essentially in idling
we do to avoid starvation of CFQ queues. Now when idling in this context,
current default of 8 ms idle window is far to large - we start the timer
after the final request is completed and thus we effectively give the
process 8 ms of CPU time to submit the next IO request. Which I think is
usually far too much. The problem is that more fine grained idling is
actually problematic because e.g. SUSE distro kernels have HZ=250 and thus
1 jiffy is 4 ms. Hence my proposal: Do you think it would be OK to convert
CFQ to use highres timers and do all the accounting in microseconds?
Then we could tune the idle time to be say 1ms or even autotune it based on
process' think time both of which I expect would get us much closer to
original throughput (4 ms idle window gets us to ~70 MB/s with my patches,
disabling idling gets us to original throughput as expected).

Converting to a non-jiffies timer base should be quite fine. We didn't have hrtimers when CFQ was written :-)

Jens Axboe

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at