Re: [RFC PATCH] blk-iocost: introduce 'linear-max' cost model for cloud disk

From: Yu Kuai

Date: Fri Feb 13 2026 - 07:14:57 EST


Hi,

在 2026/2/13 15:38, Jialin Wang 写道:
> In public cloud environments, block devices usually enforce performance
> limits based on two independent token buckets: IOPS and BPS. The device
> is throttled when either the IOPS limit or the BPS limit is reached.
>
> To effectively manage "noisy neighbor" problems, we configure iocost
> model parameters (or vrate max) to approximately 95% of the cloud
> provider's provisioned limits. The goal is to strictly avoid hitting
> the storage backend's hard BPS/IOPS limits. By saturating the virtual
> budget before the physical limit, iocost engages throttling first.
> Unlike the indiscriminate throttling applied by cloud storage backends,
> iocost selectively penalizes low-weight cgroups or heavy-traffic
> perpetrators. Consequently, IO-latency-sensitive critical workloads
> remain entirely unaffected by the congestion. Extensive testing has
> verified that this approach yields excellent isolation results.
>
> However, the existing 'linear' cost model leads to significant
> performance loss in this specific configuration due to its additive
> nature.
>
> Using tools/cgroup/iocost_coef_gen.py, we measured the following
> performance data on a typical cloud disk:
>
> 8:16 rbps=173471131 rseqiops=3566 rrandiops=3566 wbps=173333269 wseqiops=3566 wrandiops=3559

Feels like a model similar to blk-throttle will work fine with your IO workload,
what you really want is blk-throttle absolute threshold and blk-iocost relative
throttling, correct?

>
> Dividing BPS by IOPS (173471131 / 3566) yields approximately 48607
> bytes. When running fio with bs=48607, we observed a 50% drop in
> throughput compared to running without iocost enabled.
>
> The reason is that the current 'linear' model calculates cost as:
>
> Cost = BaseCost + (Pages * PerPageCost)
>
> Expanding the internal variables relative to IOPS and BPS, this is
> effectively:
>
> Cost = VTIME_PER_SEC * ((1 / IOPS - 4096 / BPS) + size / BPS)
>
> When the I/O size is such that the IOPS cost component roughly equals
> the BPS cost component (as in the bs=48607 case above), the linear
> model sums them up. Since cloud disks throttle based on *either* IOPS
> *or* BPS (whichever is exhausted first), summing them effectively
> doubles the calculated cost. This causes iocost to drain virtual time
> twice as fast as necessary, throttling the device to 50% utilization.
>
> To solve this, this patch introduces a new 'linear-max' cost model.
> Instead of adding the components, it takes the maximum:
>
> Cost = VTIME_PER_SEC * max(1 / IOPS, size / BPS)
>
> Which translates to:
>
> Cost = max(BaseCost + PerPageCost, Pages * PerPageCost)
>
> This formula correctly models the dual-bucket behavior of cloud disks.
> It ensures that for any block size, the calculated cost aligns with the
> actual bottleneck (IOPS or BPS). This allows the system to reach close
> to the provisioned BPS/IOPS limits without premature throttling, while
> still maintaining the latency protection benefits of iocost.
>
> Signed-off-by: Jialin Wang <wjl.linux@xxxxxxxxx>
> ---
> block/blk-iocost.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index ef543d163d46..ead478d8e5bc 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -445,6 +445,7 @@ struct ioc {
> int autop_idx;
> bool user_qos_params:1;
> bool user_cost_model:1;
> + bool cost_model_linear_max:1;
> };
>
> struct iocg_pcpu_stat {
> @@ -2565,7 +2566,12 @@ static void calc_vtime_cost_builtin(struct bio *bio, struct ioc_gq *iocg,
> cost += coef_seqio;
> }
> }
> - cost += pages * coef_page;
> +
> + if (ioc->cost_model_linear_max)
> + cost = max(cost + coef_page, pages * coef_page);
> + else
> + cost += pages * coef_page;
> +
> out:
> *costp = cost;
> }
> @@ -3368,10 +3374,11 @@ static u64 ioc_cost_model_prfill(struct seq_file *sf,
> return 0;
>
> spin_lock(&ioc->lock);
> - seq_printf(sf, "%s ctrl=%s model=linear "
> + seq_printf(sf, "%s ctrl=%s model=%s "
> "rbps=%llu rseqiops=%llu rrandiops=%llu "
> "wbps=%llu wseqiops=%llu wrandiops=%llu\n",
> dname, ioc->user_cost_model ? "user" : "auto",
> + ioc->cost_model_linear_max ? "linear-max" : "linear",
> u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS],
> u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS]);
> spin_unlock(&ioc->lock);
> @@ -3412,6 +3419,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
> struct ioc *ioc;
> u64 u[NR_I_LCOEFS];
> bool user;
> + bool linear_max;
> char *body, *p;
> int ret;
>
> @@ -3442,6 +3450,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
> spin_lock_irq(&ioc->lock);
> memcpy(u, ioc->params.i_lcoefs, sizeof(u));
> user = ioc->user_cost_model;
> + linear_max = ioc->cost_model_linear_max;
>
> while ((p = strsep(&body, " \t\n"))) {
> substring_t args[MAX_OPT_ARGS];
> @@ -3464,7 +3473,11 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
> continue;
> case COST_MODEL:
> match_strlcpy(buf, &args[0], sizeof(buf));
> - if (strcmp(buf, "linear"))
> + if (!strcmp(buf, "linear"))
> + linear_max = false;
> + else if (!strcmp(buf, "linear-max"))
> + linear_max = true;
> + else
> goto einval;
> continue;
> }
> @@ -3481,8 +3494,10 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
> if (user) {
> memcpy(ioc->params.i_lcoefs, u, sizeof(u));
> ioc->user_cost_model = true;
> + ioc->cost_model_linear_max = linear_max;
> } else {
> ioc->user_cost_model = false;
> + ioc->cost_model_linear_max = false;
> }
> ioc_refresh_params(ioc, true);
> spin_unlock_irq(&ioc->lock);

--
Thansk,
Kuai