Re: Flush requests not going through IO scheduler

From: Jens Axboe
Date: Tue Nov 03 2015 - 11:49:36 EST


On 11/03/2015 09:41 AM, Jan Kara wrote:
On Mon 02-11-15 09:58:01, Jens Axboe wrote:
On 11/02/2015 05:20 AM, Jan Kara wrote:
Hello,

when looking into a performance issue, I've noticed one interesting thing
in blktrace data:

8,0 2 0 1.745149746 0 m N cfq320SN / dispatch_insert
8,0 2 0 1.745150258 0 m N cfq320SN / dispatched a request
8,0 2 0 1.745150524 0 m N cfq320SN / activate rq, drv=10
8,0 2 2893 1.745150644 30477 D WS 495331192 + 192 [git]
8,0 1 3678 1.746851310 0 C WS 495331192 + 192 [0]

We wrote the data for transaction commit here.

8,0 1 0 1.746863220 0 m N cfq320SN / complete rqnoidle 1
8,0 1 0 1.746863801 0 m N cfq320SN / set_slice=27
8,0 1 0 1.746864439 0 m N cfq320SN / arm_idle: 8 group_idle: 0

Currently there is no IO queued from jbd2 thread so idle...

8,0 1 3679 1.746878424 320 A FWFS 495331384 + 8 <- (8,2) 478543928
8,0 1 3680 1.746879028 320 Q FWFS 495331384 + 8 [jbd2/sda2-8]
8,0 1 3681 1.746879673 320 G FWFS 495331384 + 8 [jbd2/sda2-8]
8,0 1 3682 1.746880227 320 I FWFS 495331384 + 8 [jbd2/sda2-8]

Jbd2 thread now queues the commit block.

8,0 1 0 1.754263523 0 m N cfq idle timer fired
8,0 1 0 1.754264733 0 m N cfq320SN / slice expired t=0

But it was not dispatched and we just idled until timer fired. Then we
started dispatching for other queue and got to dispatching the commit block
only much later.

I've looked into the block layer code and the reason for this behavior
(idling when there is in fact IO to dispatch) is the special handling of
flush requests. When a flush request is submitted, we insert it with
ELEVATOR_INSERT_FLUSH and blk_insert_flush() then handles it. That
eventually just ends up doing something along the lines of:

list_add_tail(&rq->queuelist, &q->queue_head);

So we add request to the list of requests to dispatch but we don't notify
IO scheduler in any way. Thus IO scheduler won't properly track the
request, won't properly account IO time for it if I'm right etc...

Ideally we should call q->elevator->type->ops.elevator_add_req_fn() to
handle the request but I'm not sure it won't break some assumptions of the
flush code. But at minimum shouldn't we at least try to dispatch the
request?

Certainly, the current behavior is undoubtedly broken. The least
intrusive fix would be to kick off scheduling when we add it to the
request, but the elevator should handle it. Are you going to be up
for hacking up a fix?

I have some trouble understanding what do you mean exactly. Do you think we
should just call __blk_run_queue() after we add the request to
q->queue_head?

No, that won't be enough, as it won't always break out of the idle logic. We need to ensure that the new request is noticed, so that CFQ knows and can decide to kick off things.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/