Re: [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat

From: Daniel Jordan
Date: Wed Mar 17 2021 - 18:12:06 EST


Andrey Ryabinin <arbn@xxxxxxxxxxxxxxx> writes:

> cpuacct.stat in no-root cgroups shows user time without guest time
> included int it. This doesn't match with user time shown in root
> cpuacct.stat and /proc/<pid>/stat.

Yeah, that's inconsistent.

> Make account_guest_time() to add user time to cgroup's cpustat to
> fix this.

Yep.

cgroup2's cpu.stat is broken the same way for child cgroups, and this
happily fixes it. Probably deserves a mention in the changelog.

The problem with cgroup2 was, if the workload was mostly guest time,
cpu.stat's user and system together reflected it, but it was split
unevenly across the two. I think guest time wasn't actually included in
either bucket, it was just that the little user and system time there
was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to
match sum_exec_runtime, which did have it.

The stats look ok now for both cgroup1 and 2. Just slightly unsure
whether we want to change the way both interfaces expose the accounting
in case something out there depends on it. Seems like we should, but
it'd be good to hear more opinions.

> @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime)
>
> /* Add guest time to cpustat. */
> if (task_nice(p) > 0) {
> - cpustat[CPUTIME_NICE] += cputime;
> - cpustat[CPUTIME_GUEST_NICE] += cputime;
> + task_group_account_field(p, CPUTIME_NICE, cputime);
> + task_group_account_field(p, CPUTIME_GUEST_NICE, cputime);
> } else {
> - cpustat[CPUTIME_USER] += cputime;
> - cpustat[CPUTIME_GUEST] += cputime;
> + task_group_account_field(p, CPUTIME_USER, cputime);
> + task_group_account_field(p, CPUTIME_GUEST, cputime);
> }

Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2
actually use _GUEST and _GUEST_NICE.

Could go either way. Consistency is nice, but I probably wouldn't
change the GUEST ones so people aren't confused about why they're
accounted. It's also extra cycles for nothing, even though most of the
data is probably in the cache.