Re: [PATCH 0/4] Finer granularity and task/cgroup irq timeaccounting

From: Martin Schwidefsky
Date: Wed Aug 25 2010 - 03:21:08 EST


On Tue, 24 Aug 2010 19:02:04 -0700
Venkatesh Pallipadi <venki@xxxxxxxxxx> wrote:

> On Tue, Aug 24, 2010 at 1:39 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Tue, 2010-08-24 at 12:20 -0700, Venkatesh Pallipadi wrote:
> >> So, why not do the simple things first. Do not disturb any existing
> >> scheduling decisions, account accurate hi and si times system wide,
> >> per task, per cgroup (with as less overhead as possible). Give this
> >> info to users and admin programs and they may make a higher level
> >> sense of this.
> >>
> >> Having looked at both the options, I feel having these export is an
> >> immediate first step.
> >
> > This is where I strongly disagree, providing an interface that cannot
> > possibly be implemented correctly just so you can fudge something (still
> > not sure what from userspace) seems a very bad idea indeed.
> >
>
> I don't think correctness is a problem. TSC is pretty good for this
> purpose on current hardware. I agree that usability is debatable.
>
> The use case I mentioned is some management application trying to find
> interference/slowness for a task/task group because some other si
> intensive task or flood ping on that CPU, getting to know that from
> si/hi time for task and what it "expects it to be". Yes this is vague.
> But, I think you agree that problem of si/hi interference on unrelated
> task exists today. And providing this interface was the quick way to
> give some hint to management apps about such problem. But. other
> alternative of making si and hi time as "system time" will help this
> use case as well, as the user will notice lower exec_time in that
> case.
>
> If you strongly think that the right way is to make both si and hi
> "system time" and that will not cause unfairness and slowdown for some
> unrelated tasks, I can try to cleanup the patch I had for that and
> send it out. I am afraid though, it will cause some regression and we
> will end up back at square one after a month or so. :(

But it is a correctness problem. It is wrong to account the si and hi
time to some random process. To base any kind of decision on wrong data
is asking for trouble. If we can not correctly attribute the si and hi
time to the correct process (which we agree is next to impossible) then
the only thing left to do is to report the time on its own. You can
still pick a random process in your management application and add the
time in user space. As wrong as before but some other application might
want to do smarter things with the data point.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/