Re: [PATCH 4/6] Export ns irqtimes from IRQ_TIME_ACCOUNTING through /proc/stat

From: Venkatesh Pallipadi
Date: Fri Oct 22 2010 - 19:34:42 EST


On Fri, Oct 22, 2010 at 5:23 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, 2010-10-21 at 12:25 -0700, Venkatesh Pallipadi wrote:
>> > I'd do:
>> >
>> >  - hardirq
>> >  - softirq
>> >  - user
>> >  - system
>> >     - guest
>> >     - really system
>> >  - idle
>> >
>> > Since otherwise tiny slices of softirq would need to wait for a system
>> > tick to happen before you fold them.
>> >
>> > Also, it is possible that in a single tick multiple counters overflow
>> > the jiffy boundary, so something like:
>> >
>> >  if (irqtime_account_hi_update())
>> >        cpustat->irq = ...
>> >
>> >  if (irqtime_account_si_update())
>> >        cpustate->softirq = ...
>> >
>> >  if (user_tick) {
>> >  } else if (...) {
>> >
>> >  } else ...
>> >
>> > would seem like the better approach.
>> >
>>
>> I am not sure about checking for both si and hi. That would result in
>> double accounting a tick and have some side-effects.
>
> Depends on how you look at it I guess, in order for this to occur a
> previous tick would have to be not reported, eg. consider the case where
> during two consecutive ticks the time is 50% for both sirq and hirq.
>
> Then, after the first tick, nothing will have progressed because they're
> both at 50% of a tick, after the second tick both will have reached a
> full jiffy's worth of time and need to roll over.
>
> In total two ticks happened, two ticks got accounted, {0,2}, your
> approach would make it look like {0,1,1} two ticks worth of work
> happened, two ticks got accounted, but it takes 3 ticks for that to
> happen.

Yes. But, the initial 50-50 tick would not be accounted to hirq or sirq, but to
system/idle/user depending on other conditions. So, I think it is better to keep
accounting a tick to one bucket, everytime we are called.

>> Regarding moving si above user: Yes. That seems good.
>> idle after system, That may not make so much of a difference, as there
>> is no special way to check for system time, other than !idle.
>
> Right, so about user and system... we have a bit of a problem there.
> There is overlap between si/hi and system. ksoftirqd time would be
> accounted as system and si.

Yes. In general this cannot be accurate, as long as we have fine
granularity user/system. But,
that involves system call path, where any overhead is not desirable.
If only we had
hardware counters for user/system on x86.

overlap between ksoftirq and si, again it is kind of depending on
sampling freq. We can probably change the logic to not include
ksoftirqd time in softirqd folding. Instead, we can fallback to
sampling model, and if ksoftirqd is running when we get this tick, we
can account it as softirq. This still has a problem when hirq/sirq
folding happens on the same tick then ksoftirqd may not get its
chance. Anyways, this whole stat is sampling based. I think we have
some flexibility to hand-wave-stuff :-).

> Then there is the whole issue of per-task accounting not actually using
> the system/user ticks. They use the ticks as a ratio for
> se.sum_exec_runtime.
>

Yes. That was the reason why I removed hirq/sirq changing p->stime.
So, we have a clearer user/system split. But, again its not accurate.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/