Use ktime to change the granularity of IO accounting in block layer from
milli-seconds to nano-seconds to get the proper latency values for the
devices whose latency is in micro-seconds. After changing the granularity
to nano-seconds the iostat command, which was showing incorrect values for
%util, is now showing correct values.
We did not work on the patch to drop the logic for
STAT_PRECISE_TIMESTAMPS yet. Will do it if this patch is ok.
The iostat command was run after starting the fio with following command
on an NVME disk. For the same fio command, the iostat %util was showing
~100% for the disks whose latencies are in the range of microseconds.
With the kernel changes (granularity to nano-seconds), the %util was
showing correct values. Following are the details of the test and their
output:
fio command
-----------
[global]
bs=128K
iodepth=1
direct=1
ioengine=libaio
group_reporting
time_based
runtime=90
thinktime=1ms
numjobs=1
name=raw-write
rw=randrw
ignore_error=EIO:EIO
[job1]
filename=/dev/nvme0n1
Correct values after kernel changes:
====================================
iostat output
-------------
iostat -d /dev/nvme0n1 -x 1
Device r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.08 0.05 0.06 128.00 128.00 0.07 6.50
Device r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.08 0.06 0.06 128.00 128.00 0.07 6.30
Device r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.06 0.05 0.06 128.00 128.00 0.06 5.70
From fio
--------
Read Latency: clat (usec): min=32, max=2335, avg=79.54, stdev=29.95
Write Latency: clat (usec): min=38, max=130, avg=57.76, stdev= 3.25
Values before kernel changes
============================
iostat output
-------------
iostat -d /dev/nvme0n1 -x 1
Device r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.08 0.06 0.06 128.00 128.00 1.07 97.70
Device r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.08 0.06 0.06 128.00 128.00 1.08 98.80
Device r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.08 0.05 0.06 128.00 128.00 1.06 97.20
From fio
--------
Read Latency: clat (usec): min=33, max=468, avg=79.56, stdev=28.04
Write Latency: clat (usec): min=9, max=139, avg=57.10, stdev= 3.79
Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
Signed-off-by: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>
---
block/blk-core.c | 26 +++++++++++++-------------
block/blk-mq.c | 4 ++--
block/blk.h | 2 +-
block/genhd.c | 8 ++++----
drivers/block/drbd/drbd_debugfs.c | 4 ++--
drivers/block/drbd/drbd_int.h | 2 +-
drivers/block/zram/zram_drv.c | 4 ++--
drivers/md/bcache/request.c | 10 +++++-----
drivers/md/dm-core.h | 2 +-
drivers/md/dm.c | 8 ++++----
drivers/md/md.h | 2 +-
drivers/md/raid1.h | 2 +-
drivers/md/raid10.h | 2 +-
drivers/md/raid5.c | 2 +-
drivers/nvdimm/btt.c | 2 +-
drivers/nvdimm/pmem.c | 2 +-
include/linux/blk_types.h | 2 +-
include/linux/blkdev.h | 14 +++++++-------
include/linux/part_stat.h | 2 +-
19 files changed, 50 insertions(+), 50 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 5487912befe8..069c29dad3e7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -912,12 +912,12 @@ int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
}
EXPORT_SYMBOL_GPL(iocb_bio_iopoll);
-void update_io_ticks(struct block_device *part, unsigned long now, bool end)
+void update_io_ticks(struct block_device *part, u64 now, bool end)
{
- unsigned long stamp;
+ u64 stamp;
again:
stamp = READ_ONCE(part->bd_stamp);
- if (unlikely(time_after(now, stamp))) {
+ if (unlikely(time_after64(now, stamp))) {