Re: [PATCH 1/5] sched,time: count actually elapsed irq & softirq time
From: Rik van Riel
Date: Mon Jun 27 2016 - 08:50:15 EST
On Mon, 2016-06-27 at 14:25 +0200, Frederic Weisbecker wrote:
> On Wed, Jun 22, 2016 at 10:25:47PM -0400, riel@xxxxxxxxxx wrote:
> >
> > From: Rik van Riel <riel@xxxxxxxxxx>
> >
> > Currently, if there was any irq or softirq time during 'ticks'
> > jiffies, the entire period will be accounted as irq or softirq
> > time.
> >
> > This is inaccurate if only a subset of 'ticks' jiffies was
> > actually spent handling irqs, and could conceivably mis-count
> > all of the ticks during a period as irq time, when there was
> > some irq and some softirq time.
> Good catch!
>
> Many comments following.
>
> >
> >
> > This can actually happen when irqtime_account_process_tick
> > is called from account_idle_ticks, which can pass a larger
> > number of ticks down all at once.
> >
> > Fix this by changing irqtime_account_hi_update and
> > irqtime_account_si_update to round elapsed irq and softirq
> > time to jiffies, and return the number of jiffies spent in
> > each mode, similar to how steal time is handled.
> >
> > Additionally, have irqtime_account_process_tick take into
> > account how much time was spent in each of steal, irq,
> > and softirq time.
> >
> > The latter could help improve the accuracy of timekeeping
> Maybe you meant cputime? Timekeeping is rather about jiffies and
> GTOD.
>
> >
> > when returning from idle on a NO_HZ_IDLE CPU.
> >
> > Properly accounting how much time was spent in hardirq and
> > softirq time will also allow the NO_HZ_FULL code to re-use
> > these same functions for hardirq and softirq accounting.
> >
> > Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
> >Â
> > Â local_irq_save(flags);
> > - latest_ns = this_cpu_read(cpu_hardirq_time);
> > - if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
> > - ret = 1;
> > + irq = this_cpu_read(cpu_hardirq_time) -
> > cpustat[CPUTIME_IRQ];
> cpu_hardirq_time is made of nsecs whereas cpustat is of cputime_t (in
> fact
> even cputime64_t). So you need to convert cpu_hardirq_time before
> doing the
> substract.
Doh. Good catch!
> > -static int irqtime_account_si_update(void)
> > +static unsigned long irqtime_account_si_update(unsigned long
> > max_jiffies)
> > Â{
> > Â u64 *cpustat = kcpustat_this_cpu->cpustat;
> > + unsigned long si_jiffies = 0;
> > Â unsigned long flags;
> > - u64 latest_ns;
> > - int ret = 0;
> > + u64 softirq;
> > Â
> > Â local_irq_save(flags);
> > - latest_ns = this_cpu_read(cpu_softirq_time);
> > - if (nsecs_to_cputime64(latest_ns) >
> > cpustat[CPUTIME_SOFTIRQ])
> > - ret = 1;
> > + softirq = this_cpu_read(cpu_softirq_time) -
> > cpustat[CPUTIME_SOFTIRQ];
> > + if (softirq > cputime_one_jiffy) {
> > + si_jiffies = min(max_jiffies,
> > cputime_to_jiffies(softirq));
> > + cpustat[CPUTIME_SOFTIRQ] +=
> > jiffies_to_cputime(si_jiffies);
> > + }
> > Â local_irq_restore(flags);
> > - return ret;
> > + return si_jiffies;
> So same comments apply here.
>
> [...]
> >
> > Â * Accumulate raw cputime values of dead tasks (sig->[us]time) and
> > live
> > Â * tasks (sum on group iteration) belonging to @tsk's group.
> > Â */
> > @@ -344,19 +378,24 @@ static void
> > irqtime_account_process_tick(struct task_struct *p, int user_tick,
> > Â{
> > Â cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
> > Â u64 cputime = (__force u64) cputime_one_jiffy;
> > - u64 *cpustat = kcpustat_this_cpu->cpustat;
> > + unsigned long other;
> > Â
> > - if (steal_account_process_tick(ULONG_MAX))
> > + /*
> > + Â* When returning from idle, many ticks can get accounted
> > at
> > + Â* once, including some ticks of steal, irq, and softirq
> > time.
> > + Â* Subtract those ticks from the amount of time accounted
> > to
> > + Â* idle, or potentially user or system time. Due to
> > rounding,
> > + Â* other time can exceed ticks occasionally.
> > + Â*/
> > + other = account_other_ticks(ticks);
> > + if (other >= ticks)
> > Â return;
> > + ticks -= other;
> > Â
> > Â cputime *= ticks;
> > Â scaled *= ticks;
> So instead of dealing with ticks here, I think you should rather use
> the above
> cputime as both the limit and the remaining time to account after
> steal/irqs.
>
> This should avoid some middle conversions and improve precision when
> cputime_t == nsecs granularity.
>
> If we account 2 ticks to idle (lets say HZ=100) and irq time to
> account is 15 ms. 2 ticks = 20 ms
> so we have 5 ms left to account to idle. With the jiffies granularity
> in this patch, we would account
> one tick to irqtime (1 tick = 10 ms, there will be 5 ms to account
> back later) and one tick to idle
> time whereas if you deal with cputime_t, you are going to account the
> correct amount of idle time.
>
Ahhh, so you want irqtime_account_process_tick to work with
and account fractional ticks when calling account_system_time,
account_user_time, account_idle_time, etc?
I guess that should work fine since we already pass cputime
values in, anyway.
I suppose we can do the same for get_vtime_delta, too.
They can both work with the actual remaining time (in cputime_t),
after the other time has been subtracted.
I can rework the series to do that.
--
All Rights Reversed.
Attachment:
signature.asc
Description: This is a digitally signed message part