Re: [PATCH 4/6] Export ns irqtimes from IRQ_TIME_ACCOUNTING through /proc/stat

From: Venkatesh Pallipadi
Date: Thu Oct 21 2010 - 15:25:22 EST


On Thu, Oct 21, 2010 at 7:44 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2010-10-20 at 15:49 -0700, Venkatesh Pallipadi wrote:
>
>> +static int irqtime_account_hi_update(void)
>> +{
>> +     struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
>> +     unsigned long flags;
>> +     u64 latest_ns;
>> +     int ret = 0;
>> +
>> +     local_irq_save(flags);
>> +     latest_ns = __get_cpu_var(cpu_hardirq_time);
>
> I guess this_cpu_read() would again be an improvement.. same for the SI
> version.
>

Yes.

>> +     if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
>> +             ret = 1;
>> +     local_irq_restore(flags);
>> +     return ret;
>> +}
>
>> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
>> +/*
>> + * Account a tick to a process and cpustat
>> + * @p: the process that the cpu time gets accounted to
>> + * @user_tick: is the tick from userspace
>> + * @rq: the pointer to rq
>> + *
>> + * Tick demultiplexing follows the order
>> + * - pending hardirq update
>> + * - user_time
>> + * - pending softirq update
>> + * - idle_time
>> + * - system time
>> + *   - check for guest_time
>> + *   - else account as system_time
>> + *
>> + * Check for hardirq is done both for system and user time as there is
>> + * no timer going off while we are on hardirq and hence we may never get an
>> + * oppurtunity to update it solely in system time.
>
> My mailer suggests you spell that as: opportunity :-)

Ah, I don't have spell checker on my editor :-). Will change.

>> + * p->stime and friends are only updated on system time and not on irq
>> + * softirq as those do not count in task exec_runtime any more.
>> + */
>> +static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>> +                                             struct rq *rq)
>> +{
>> +     cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>> +     cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
>> +     struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
>> +
>> +     if (irqtime_account_hi_update()) {
>> +             cpustat->irq = cputime64_add(cpustat->irq, tmp);
>> +     } else if (user_tick) {
>> +             account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>> +     } else if (irqtime_account_si_update()) {
>> +             cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
>> +     } else if (p == rq->idle) {
>> +             account_idle_time(cputime_one_jiffy);
>> +     } else if (p->flags & PF_VCPU) { /* System time or guest time */
>> +             account_guest_time(p, cputime_one_jiffy, one_jiffy_scaled);
>> +     } else {
>> +             __account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
>> +                                     &cpustat->system);
>> +     }
>> +}
>
> I'd do:
>
>  - hardirq
>  - softirq
>  - user
>  - system
>     - guest
>     - really system
>  - idle
>
> Since otherwise tiny slices of softirq would need to wait for a system
> tick to happen before you fold them.
>
> Also, it is possible that in a single tick multiple counters overflow
> the jiffy boundary, so something like:
>
>  if (irqtime_account_hi_update())
>        cpustat->irq = ...
>
>  if (irqtime_account_si_update())
>        cpustate->softirq = ...
>
>  if (user_tick) {
>  } else if (...) {
>
>  } else ...
>
> would seem like the better approach.
>

I am not sure about checking for both si and hi. That would result in
double accounting a tick and have some side-effects.
Regarding moving si above user: Yes. That seems good.
idle after system, That may not make so much of a difference, as there
is no special way to check for system time, other than !idle.

>>  /*
>>   * Account for involuntary wait time.
>>   * @steal: the cpu time spent in involuntary wait
>> @@ -3594,6 +3685,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
>>       cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>>       struct rq *rq = this_rq();
>>
>> +     if (sched_clock_irqtime) {
>> +             irqtime_account_process_tick(p, user_tick, rq);
>> +             return;
>> +     }
>> +
>>       if (user_tick)
>>               account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>>       else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
>
> mark_tsc_unstable() can disable sched_clock_irqtime at any time, the
> accounting won't go funny due to that right?
>

That should be OK. We would just fallback to lowres tick based accounting.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/