Re: [PATCH 4/6] Export ns irqtimes from IRQ_TIME_ACCOUNTING through /proc/stat
From: Venkatesh Pallipadi
Date: Thu Oct 21 2010 - 15:25:22 EST
On Thu, Oct 21, 2010 at 7:44 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2010-10-20 at 15:49 -0700, Venkatesh Pallipadi wrote:
>
>> +static int irqtime_account_hi_update(void)
>> +{
>> + struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
>> + unsigned long flags;
>> + u64 latest_ns;
>> + int ret = 0;
>> +
>> + local_irq_save(flags);
>> + latest_ns = __get_cpu_var(cpu_hardirq_time);
>
> I guess this_cpu_read() would again be an improvement.. same for the SI
> version.
>
Yes.
>> + if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
>> + ret = 1;
>> + local_irq_restore(flags);
>> + return ret;
>> +}
>
>> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
>> +/*
>> + * Account a tick to a process and cpustat
>> + * @p: the process that the cpu time gets accounted to
>> + * @user_tick: is the tick from userspace
>> + * @rq: the pointer to rq
>> + *
>> + * Tick demultiplexing follows the order
>> + * - pending hardirq update
>> + * - user_time
>> + * - pending softirq update
>> + * - idle_time
>> + * - system time
>> + * - check for guest_time
>> + * - else account as system_time
>> + *
>> + * Check for hardirq is done both for system and user time as there is
>> + * no timer going off while we are on hardirq and hence we may never get an
>> + * oppurtunity to update it solely in system time.
>
> My mailer suggests you spell that as: opportunity :-)
Ah, I don't have spell checker on my editor :-). Will change.
>> + * p->stime and friends are only updated on system time and not on irq
>> + * softirq as those do not count in task exec_runtime any more.
>> + */
>> +static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>> + struct rq *rq)
>> +{
>> + cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>> + cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
>> + struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
>> +
>> + if (irqtime_account_hi_update()) {
>> + cpustat->irq = cputime64_add(cpustat->irq, tmp);
>> + } else if (user_tick) {
>> + account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>> + } else if (irqtime_account_si_update()) {
>> + cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
>> + } else if (p == rq->idle) {
>> + account_idle_time(cputime_one_jiffy);
>> + } else if (p->flags & PF_VCPU) { /* System time or guest time */
>> + account_guest_time(p, cputime_one_jiffy, one_jiffy_scaled);
>> + } else {
>> + __account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
>> + &cpustat->system);
>> + }
>> +}
>
> I'd do:
>
> - hardirq
> - softirq
> - user
> - system
> - guest
> - really system
> - idle
>
> Since otherwise tiny slices of softirq would need to wait for a system
> tick to happen before you fold them.
>
> Also, it is possible that in a single tick multiple counters overflow
> the jiffy boundary, so something like:
>
> if (irqtime_account_hi_update())
> cpustat->irq = ...
>
> if (irqtime_account_si_update())
> cpustate->softirq = ...
>
> if (user_tick) {
> } else if (...) {
>
> } else ...
>
> would seem like the better approach.
>
I am not sure about checking for both si and hi. That would result in
double accounting a tick and have some side-effects.
Regarding moving si above user: Yes. That seems good.
idle after system, That may not make so much of a difference, as there
is no special way to check for system time, other than !idle.
>> /*
>> * Account for involuntary wait time.
>> * @steal: the cpu time spent in involuntary wait
>> @@ -3594,6 +3685,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
>> cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>> struct rq *rq = this_rq();
>>
>> + if (sched_clock_irqtime) {
>> + irqtime_account_process_tick(p, user_tick, rq);
>> + return;
>> + }
>> +
>> if (user_tick)
>> account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>> else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
>
> mark_tsc_unstable() can disable sched_clock_irqtime at any time, the
> accounting won't go funny due to that right?
>
That should be OK. We would just fallback to lowres tick based accounting.
Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/