Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation

From: Qais Yousef
Date: Tue Dec 29 2020 - 09:16:15 EST


On 12/29/20 14:41, Frederic Weisbecker wrote:
> On Mon, Dec 28, 2020 at 02:15:29AM +0000, Qais Yousef wrote:
> > Hi Frederic
> >
> > On 12/02/20 12:57, Frederic Weisbecker wrote:
> > > @@ -66,9 +68,9 @@ void irqtime_account_irq(struct task_struct *curr)
> > > * in that case, so as not to confuse scheduler with a special task
> > > * that do not consume any time, but still wants to run.
> > > */
> > > - if (hardirq_count())
> > > + if (pc & HARDIRQ_MASK)
> > > irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> > > - else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
> > > + else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> >
> > Noob question. Why for SOFTIRQs we do sofirq_count() & *SOFTIRQ_OFFSET*? It
> > seems we're in-softirq only if the count is odd numbered.
> >
> > /me tries to dig more
> >
> > Hmm could it be because the softirq count is actually 1 bit and the rest is
> > for SOFTIRQ_DISABLE_OFFSET (BH disabled)?
>
> Exactly!
>
> >
> > IOW, 1 bit is for we're in softirq context, and the remaining 7 bits are to
> > count BH disable nesting, right?
> >
> > I guess this would make sense; we don't nest softirqs processing AFAIK. But
> > I could be misreading the code too :-)
>
> You got it right!
>
> This is commented in softirq.c somewhere:
>
> /*
> * preempt_count and SOFTIRQ_OFFSET usage:
> * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
> * softirq processing.
> * - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)
> * on local_bh_disable or local_bh_enable.
> * This lets us distinguish between whether we are currently processing
> * softirq and whether we just have bh disabled.
> */
>
> But we should elaborate on the fact that, indeed, softirq processing can't nest,
> while softirq disablement can. I should try to send a patch and comment more
> thoroughly on the subtleties of preempt mask in preempt.h.

Thanks for the info!

>
> >
> > > irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> > > }
> > >
> > > @@ -417,11 +419,13 @@ void vtime_task_switch(struct task_struct *prev)
> > > }
> > > # endif
> > >
> > > -void vtime_account_irq(struct task_struct *tsk)
> > > +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
> > > {
> > > - if (hardirq_count()) {
> > > + unsigned int pc = preempt_count() - offset;
> > > +
> > > + if (pc & HARDIRQ_OFFSET) {
> >
> > Shouldn't this be HARDIRQ_MASK like above?
>
> In the rare cases of nested hardirqs happening with broken drivers, Only the outer hardirq
> does matter. All the time spent in the inner hardirqs is included in the outer
> one.

Ah I see. The original code was doing hardirq_count(), which apparently wasn't
right either.

Shouldn't it be pc == HARDIRQ_OFFSET then? All odd nest counts will trigger
this otherwise, and IIUC we want this to trigger once on first entry only.

Thanks

--
Qais Yousef