Re: [PATCH v3] blk-iocost: fix busy_level reset when no IOs complete
From: Tejun Heo
Date: Tue Mar 31 2026 - 15:08:38 EST
On Tue, Mar 31, 2026 at 10:05:09AM +0000, Jialin Wang wrote:
> When a disk is saturated, it is common for no IOs to complete within a
> timer period. Currently, in this case, rq_wait_pct and missed_ppm are
> calculated as 0, the iocost incorrectly interprets this as meeting QoS
> targets and resets busy_level to 0.
>
> This reset prevents busy_level from reaching the threshold (4) needed
> to reduce vrate. On certain cloud storage, such as Azure Premium SSD,
> we observed that iocost may fail to reduce vrate for tens of seconds
> during saturation, failing to mitigate noisy neighbor issues.
>
> Fix this by tracking the number of IO completions (nr_done) in a period.
> If nr_done is 0 and there are lagging IOs, the saturation status is
> unknown, so we keep busy_level unchanged.
>
> The issue is consistently reproducible on Azure Standard_D8as_v5 (Dasv5)
> VMs with 512GB Premium SSD (P20) using the script below. It was not
> observed on GCP n2d VMs (with 100G pd-ssd and 1.5T local-ssd), and no
> regressions were found with this patch. In this script, cgA performs
> large IOs with iodepth=128, while cgB performs small IOs with iodepth=1
> rate_iops=100 rw=randrw. With iocost enabled, we expect it to throttle
> cgA, the submission latency (slat) of cgA should be significantly higher,
> cgB can reach 200 IOPS and the completion latency (clat) should below.
...
> Signed-off-by: Jialin Wang <wjl.linux@xxxxxxxxx>
Acked-by: Tejun Heo <tj@xxxxxxxxxx>
Thanks.
--
tejun