RE: [PATCH] block: Consider inflight IO in io accounting for high latency devices

From: Gulam Mohamed
Date: Tue Sep 12 2023 - 17:01:57 EST


Thanks Jens for reviewing this patch. Can you please look my comments inline?

Regards,
Gulam Mohamed.
-----Original Message-----
From: Jens Axboe <axboe@xxxxxxxxx>
Sent: Friday, September 8, 2023 8:04 PM
To: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH] block: Consider inflight IO in io accounting for high latency devices

On 9/7/23 3:45 PM, Gulam Mohamed wrote:
> diff --git a/block/blk-mq.c b/block/blk-mq.c index
> ec922c6bccbe..70e5763fb799 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1000,6 +1000,8 @@ static inline void blk_account_io_done(struct
> request *req, u64 now)
>
> static inline void blk_account_io_start(struct request *req) {
> + bool delta = false;
> +

This is an odd name for this variable...
[GULAM]: Thanks. I will change this.

> @@ -1015,7 +1017,10 @@ static inline void blk_account_io_start(struct request *req)
> req->part = req->q->disk->part0;
>
> part_stat_lock();
> - update_io_ticks(req->part, jiffies, false);
> + if (req->q->nr_hw_queues == 1) {
> + delta = !!part_in_flight(req->part);
> + }

No parens needed here. But that aside, I think this could be a lot better. You don't really care about the number of requests inflight, only if there are some. A better helper than part_in_flight() could do that ala:

static bool part_any_in_flight(struct block_device *part) {
int cpu;

for_each_possible_cpu(cpu) {
if (part_stat_local_read_cpu(part, in_flight[0], cpu) ||
part_stat_local_read_cpu(part, in_flight[1], cpu))
return true;
}

return false;
}
[GULAM]: Is there a possibility that the IO request submit and completion can happen on different CPU? I am thinking that there could be positive numbers and negative numbers from different CPUs resulting in total inflight to 0. The negative number could be due to that the IO completion could happen on another CPU.

But I do wonder if it's just missed state checking for the request itself that's missing this, and this is fixing it entirely the wrong way around.
[GULAM]: Trying to understand this comment. Can you please explore more on this?

--
Jens Axboe