Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation

From: Frederic Weisbecker
Date: Tue Dec 29 2020 - 08:46:11 EST


On Mon, Dec 28, 2020 at 02:15:29AM +0000, Qais Yousef wrote:
> Hi Frederic
>
> On 12/02/20 12:57, Frederic Weisbecker wrote:
> > @@ -66,9 +68,9 @@ void irqtime_account_irq(struct task_struct *curr)
> > * in that case, so as not to confuse scheduler with a special task
> > * that do not consume any time, but still wants to run.
> > */
> > - if (hardirq_count())
> > + if (pc & HARDIRQ_MASK)
> > irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> > - else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
> > + else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
>
> Noob question. Why for SOFTIRQs we do sofirq_count() & *SOFTIRQ_OFFSET*? It
> seems we're in-softirq only if the count is odd numbered.
>
> /me tries to dig more
>
> Hmm could it be because the softirq count is actually 1 bit and the rest is
> for SOFTIRQ_DISABLE_OFFSET (BH disabled)?

Exactly!

>
> IOW, 1 bit is for we're in softirq context, and the remaining 7 bits are to
> count BH disable nesting, right?
>
> I guess this would make sense; we don't nest softirqs processing AFAIK. But
> I could be misreading the code too :-)

You got it right!

This is commented in softirq.c somewhere:

/*
* preempt_count and SOFTIRQ_OFFSET usage:
* - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
* softirq processing.
* - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)
* on local_bh_disable or local_bh_enable.
* This lets us distinguish between whether we are currently processing
* softirq and whether we just have bh disabled.
*/

But we should elaborate on the fact that, indeed, softirq processing can't nest,
while softirq disablement can. I should try to send a patch and comment more
thoroughly on the subtleties of preempt mask in preempt.h.

>
> > irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> > }
> >
> > @@ -417,11 +419,13 @@ void vtime_task_switch(struct task_struct *prev)
> > }
> > # endif
> >
> > -void vtime_account_irq(struct task_struct *tsk)
> > +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
> > {
> > - if (hardirq_count()) {
> > + unsigned int pc = preempt_count() - offset;
> > +
> > + if (pc & HARDIRQ_OFFSET) {
>
> Shouldn't this be HARDIRQ_MASK like above?

In the rare cases of nested hardirqs happening with broken drivers, Only the outer hardirq
does matter. All the time spent in the inner hardirqs is included in the outer
one.

Thanks.

>
> > vtime_account_hardirq(tsk);
> > - } else if (in_serving_softirq()) {
> > + } else if (pc & SOFTIRQ_OFFSET) {
> > vtime_account_softirq(tsk);
> > } else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) &&
> > is_idle_task(tsk)) {
>
> Thanks
>
> --
> Qais Yousef