Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group

From: çèé
Date: Mon Jan 28 2019 - 02:21:23 EST


Hi Michael
> Task competition inside a cgroup won't be considered as cgroup's
> competition, please try create another cgroup with dead loop on
> each CPU

Yes, you are right, but I don't think we just need to account for
cgroup's competition,
because this factor does not reflect cgroup internal conditions. We
still need a proper
method to evaluate CPU competition inside a cgroup.


> Running tasks doesn't means no competition, only if that cgroup occupied
> the CPU exclusively at that moment.

I care much about CPU competiton inside a cgroup. I can only read
'/proc/$pid/schedstat'
thousands of times to get every task's wait_sum time without cgroup
hierarchy wait_sum,
and it definitely tasks a real long time(40ms for 8000 tasks in a container).


> No offense but I'm afraid you misunderstand the problem we try to solve
> by wait_sum, if your purpose is to have a way to tell whether there are
> sufficient CPU inside a container, please try lxcfs + top, if there are
> almost no idle and load is high, then the CPU resource is not sufficient.

emmmm... Maybe I didn't make it clear. We need to dynamically adjust the
number of CPUs for a container based on the running state of tasks inside
the container. If we find tasks' wait_sum are really high, we will add more
CPU cores to this container, or else we will decline some CPU to this container.
In a word, we want to ensure 'co-scheduling' for high priority containers.


>Frankly speaking this sounds like a supplement rather than a missing piece,
>although we don't rely on lxcfs and modify the kernel ourselves to support
>container environment, I still don't think such kind of solutions should be
>in kernel.

I don't care if this value is considered as a supplement or a missing piece. I
only care about how can I assess the running state inside a container. I think
lxcfs is really a good solution to improve the visibility of container
resources,
but it is not good enough at the moment.

/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime

we can read this procfs file inside a container,but this file still
cannot reflect
real-time information. Please think about the following scenario: a
'rabbit' process
will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms
and then exit. How can we detect this thrashing workload without
hierarchy wait_sum?

Thanks,
Yuzhoujian