Re: [PATCH v8 4/4] sched: Fix cgroup irq time for CONFIG_IRQ_TIME_ACCOUNTING
From: Michal Koutný
Date: Thu Jan 09 2025 - 05:47:14 EST
Hello Yafang.
I consider the runtimization you did in the first three patches
sensible, however, this fourth patch is a hard sell.
On Fri, Jan 03, 2025 at 10:24:09AM +0800, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> However, despite adding more threads to handle an increased workload,
> the CPU usage could not be raised.
(Is that behavior same in both CONFIG_IRQ_TIME_ACCOUNTING and
!CONFIG_IRQ_TIME_ACCOUNTING cases?)
> In other words, even though the container’s CPU usage appeared low, it
> was unable to process more workloads to utilize additional CPU
> resources, which caused issues.
Hm, I think this would be worth documenting in the context of
CONFIG_IRQ_TIME_ACCOUNTING and irq.pressure.
> The CPU usage of the cgroup is relatively low at around 55%, but this usage
> doesn't increase, even with more netperf tasks. The reason is that CPU0 is
> at 100% utilization, as confirmed by mpstat:
>
> 02:56:22 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 02:56:23 PM 0 0.99 0.00 55.45 0.00 0.99 42.57 0.00 0.00 0.00 0.00
>
> 02:56:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 02:56:24 PM 0 2.00 0.00 55.00 0.00 0.00 43.00 0.00 0.00 0.00 0.00
>
> It is clear that the %soft is excluded in the cgroup of the interrupted
> task. This behavior is unexpected. We should include IRQ time in the
> cgroup to reflect the pressure the group is under.
What is irq.pressure shown in this case?
> The system metric in cpuacct.stat is crucial in indicating whether a
> container is under heavy system pressure, including IRQ/softirq activity.
> Hence, IRQ/softirq time should be included in the cpuacct system usage,
> which also applies to cgroup2’s rstat.
But this only works for you where cgroup's workload induces IRQ on
itself but generally it'd be confusing (showing some sys time that
originates out of the cgroup). irq.pressure covers this universally (or
it should).
On Fri, Jan 03, 2025 at 10:24:05AM +0800, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> The load balancer is malfunctioning due to the exclusion of IRQ time from
> CPU utilization calculations. What's worse, there is no effective way to
> add the irq time back into the CPU utilization based on current
> available metrics. Therefore, we have to change the kernel code.
That's IMO what irq.pressure (PSI) should be useful for. Adjusting
cgroup's CPU usage with irq.pressue (perhaps not as simple as
multiplication, Johannes may step in here) should tell you info for load
balancer.
Regards,
Michal
Attachment:
signature.asc
Description: PGP signature