[PATCH v2 0/4] sched: Fix irq accounting for CONFIG_IRQ_TIME_ACCOUNTING

From: Yafang Shao
Date: Tue Oct 08 2024 - 02:39:40 EST


After enabling CONFIG_IRQ_TIME_ACCOUNTING to track IRQ pressure in our
container environment, we encountered several user-visible behavioral
changes:

- Interrupted IRQ/softirq time is not accounted for in the cpuacct cgroup

This breaks userspace applications that rely on CPU usage data from
cgroups to monitor CPU pressure. This patchset resolves the issue by
ensuring that IRQ/softirq time is accounted for in the cgroup of the
interrupted tasks.

- getrusage(2) does not include time interrupted by IRQ/softirq

Some services use getrusage(2) to check if workloads are experiencing CPU
pressure. Since IRQ/softirq time is no longer charged to task runtime,
getrusage(2) can no longer reflect the CPU pressure caused by heavy
interrupts.

This patchset addresses the first issue, which is relatively
straightforward. However, the second issue remains unresolved, as there
might be debate over whether interrupted time should be considered part of
a task’s usage. Nonetheless, it is important to report interrupted time to
the user via some metric, though that is a separate discussion.

Changes:
v1->v2:
- Fix lockdep issues reported by kernel test robot <oliver.sang@xxxxxxxxx>

v1: https://lore.kernel.org/all/20240923090028.16368-1-laoar.shao@xxxxxxxxx/

Yafang Shao (4):
sched: Define sched_clock_irqtime as static key
sched: Don't account irq time if sched_clock_irqtime is disabled
sched, psi: Don't account irq time if sched_clock_irqtime is disabled
sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING

kernel/sched/core.c | 83 ++++++++++++++++++++++++++++++------------
kernel/sched/cputime.c | 16 ++++----
kernel/sched/psi.c | 12 +-----
kernel/sched/sched.h | 1 +
kernel/sched/stats.h | 7 ++--
5 files changed, 74 insertions(+), 45 deletions(-)

--
2.43.5