Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group

From: çèé
Date: Mon Jan 28 2019 - 02:21:23 EST

Next message: Lee Jones: "Re: [for next][PATCH 1/2] mfd: Fix unmet dependency warning for MFD_TPS68470"
Previous message: Lee Jones: "Re: [RFC PATCH v2 00/10] support ROHM BD70528 PMIC"
In reply to: çè: "Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group"
Next in thread: çè: "Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Michael
> Task competition inside a cgroup won't be considered as cgroup's
> competition, please try create another cgroup with dead loop on
> each CPU

Yes, you are right, but I don't think we just need to account for
cgroup's competition,
because this factor does not reflect cgroup internal conditions. We
still need a proper
method to evaluate CPU competition inside a cgroup.

> Running tasks doesn't means no competition, only if that cgroup occupied
> the CPU exclusively at that moment.

I care much about CPU competiton inside a cgroup. I can only read
'/proc/$pid/schedstat'
thousands of times to get every task's wait_sum time without cgroup
hierarchy wait_sum,
and it definitely tasks a real long time(40ms for 8000 tasks in a container).

> No offense but I'm afraid you misunderstand the problem we try to solve
> by wait_sum, if your purpose is to have a way to tell whether there are
> sufficient CPU inside a container, please try lxcfs + top, if there are
> almost no idle and load is high, then the CPU resource is not sufficient.

emmmm... Maybe I didn't make it clear. We need to dynamically adjust the
number of CPUs for a container based on the running state of tasks inside
the container. If we find tasks' wait_sum are really high, we will add more
CPU cores to this container, or else we will decline some CPU to this container.
In a word, we want to ensure 'co-scheduling' for high priority containers.

>Frankly speaking this sounds like a supplement rather than a missing piece,
>although we don't rely on lxcfs and modify the kernel ourselves to support
>container environment, I still don't think such kind of solutions should be
>in kernel.

I don't care if this value is considered as a supplement or a missing piece. I
only care about how can I assess the running state inside a container. I think
lxcfs is really a good solution to improve the visibility of container
resources,
but it is not good enough at the moment.

/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime

we can read this procfs file inside a container,but this file still
cannot reflect
real-time information. Please think about the following scenario: a
'rabbit' process
will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms
and then exit. How can we detect this thrashing workload without
hierarchy wait_sum?

Thanks,
Yuzhoujian

Next message: Lee Jones: "Re: [for next][PATCH 1/2] mfd: Fix unmet dependency warning for MFD_TPS68470"
Previous message: Lee Jones: "Re: [RFC PATCH v2 00/10] support ROHM BD70528 PMIC"
In reply to: çè: "Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group"
Next in thread: çè: "Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]