Re: [PATCH 2/2] sched: Implement interface for cgroup unified hierarchy

From: Tejun Heo
Date: Wed Aug 02 2017 - 14:16:10 EST

Hello, Peter.

On Wed, Aug 02, 2017 at 06:05:11PM +0200, Peter Zijlstra wrote:
> > * The stat file is sampling based and the usage files are calculated
> > from actual scheduling events. Is this because the latter is more
> > accurate?
> So I actually don't know the history of this stuff too well. But I would
> think so. This all looks rather dodgy.

I see.

> > * Why do we have user/sys breakdown in usage numbers? It tries to
> > distinguish user or sys by looking at task_pt_regs(). I can't see
> > how this would work (e.g. interrupt handlers never schedule) and w/o
> > kernel preemption, the sys part is always zero. What is this number
> > supposed to mean?
> For normal scheduler stuff we account the total runtime in ns and use
> the user/kernel tick samples to divide it into user/kernel time parts.
> See cputime_adjust().
> But looking at the cpuacct I have no clue, that looks wonky at best.
> Ideally we'd reuse the normal cputime code and do the same thing
> per-cgroup, but clearly that isn't happening now.
> I never really looked further than that cpuacct_charge() doing _another_
> cgroup iteration, even though we already account that delta to each
> cgroup (modulo scheduling class crud).

Yeah, it's kinda silly. I'll see if I can just kill cpuacct for