Re: [patch V2 2/9] irqtime: Make accounting correct on RT

From: Frederic Weisbecker
Date: Sun Dec 06 2020 - 20:15:38 EST


On Mon, Dec 07, 2020 at 01:57:25AM +0100, Thomas Gleixner wrote:
> On Mon, Dec 07 2020 at 01:23, Frederic Weisbecker wrote:
> >> --- a/kernel/sched/cputime.c
> >> +++ b/kernel/sched/cputime.c
> >> @@ -60,7 +60,7 @@ void irqtime_account_irq(struct task_str
> >> cpu = smp_processor_id();
> >> delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
> >> irqtime->irq_start_time += delta;
> >> - pc = preempt_count() - offset;
> >> + pc = irq_count() - offset;
> >
> > There are many preempt_count() users all around waiting for similar issues.
> > Wouldn't it be more reasonable to have current->softirq_disable_cnt just saving
> > the softirq count on context switch?
>
> There are not that many and all of them need to be looked at.
>
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index d2003a7d5ab5..6c899c35d6ba 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3469,6 +3469,10 @@ static inline void prepare_task(struct task_struct *next)
> >
> > static inline void finish_task(struct task_struct *prev)
> > {
> > +#ifdef CONFIG_PREEMPT_RT
> > + prev->softirq_disable_cnt = softirq_count();
> > + __preempt_count_sub(prev->softirq_disable_cnt);
> > +#endif
>
> You fundamentaly break RT with that.
>
> If local_bh_disable() fiddles with the actual preempt_count on RT then
> softirq disabled sections and softirq processing are not longer
> preemtible.
>
> You asked me in the last round of patches to add a proper argument for
> pulling out the softirq count from preempt_count. Here is the revised
> changelog which you agreed with:
>
> "RT requires the softirq processing and local bottomhalf disabled regions to
> be preemptible. Using the normal preempt count based serialization is
> therefore not possible because this implicitely disables preemption.
> ....
> "
>
> Full text in patch 1/9.
>
> According to the above folding of softirq count into the actual preempt
> count cannot work at all.
>
> The current RT approach just works except for the places which look at
> the raw preempt_count and not using the wrappers. Those places are
> restricted to core code and a pretty small number.
>
> Trying to do what you suggest would be a major surgery all over the
> place including a complete trainwreck on the highly optimized
> preempt_enable() --> preempt decision.

I suspected it was more complicated than I imagined :-)
Nevermind.

Thanks.