Re: [patch V2 2/9] irqtime: Make accounting correct on RT

From: Thomas Gleixner
Date: Sun Dec 06 2020 - 19:59:03 EST


On Mon, Dec 07 2020 at 01:23, Frederic Weisbecker wrote:
>> --- a/kernel/sched/cputime.c
>> +++ b/kernel/sched/cputime.c
>> @@ -60,7 +60,7 @@ void irqtime_account_irq(struct task_str
>> cpu = smp_processor_id();
>> delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
>> irqtime->irq_start_time += delta;
>> - pc = preempt_count() - offset;
>> + pc = irq_count() - offset;
>
> There are many preempt_count() users all around waiting for similar issues.
> Wouldn't it be more reasonable to have current->softirq_disable_cnt just saving
> the softirq count on context switch?

There are not that many and all of them need to be looked at.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d2003a7d5ab5..6c899c35d6ba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3469,6 +3469,10 @@ static inline void prepare_task(struct task_struct *next)
>
> static inline void finish_task(struct task_struct *prev)
> {
> +#ifdef CONFIG_PREEMPT_RT
> + prev->softirq_disable_cnt = softirq_count();
> + __preempt_count_sub(prev->softirq_disable_cnt);
> +#endif

You fundamentaly break RT with that.

If local_bh_disable() fiddles with the actual preempt_count on RT then
softirq disabled sections and softirq processing are not longer
preemtible.

You asked me in the last round of patches to add a proper argument for
pulling out the softirq count from preempt_count. Here is the revised
changelog which you agreed with:

"RT requires the softirq processing and local bottomhalf disabled regions to
be preemptible. Using the normal preempt count based serialization is
therefore not possible because this implicitely disables preemption.
....
"

Full text in patch 1/9.

According to the above folding of softirq count into the actual preempt
count cannot work at all.

The current RT approach just works except for the places which look at
the raw preempt_count and not using the wrappers. Those places are
restricted to core code and a pretty small number.

Trying to do what you suggest would be a major surgery all over the
place including a complete trainwreck on the highly optimized
preempt_enable() --> preempt decision.

> And I expect a few adjustments in schedule_debug() and atomic checks more
> generally but that should be it and it would probably be less error-prone.

Dream on.

> Although I'm probably overlooking other issues on the way.

I think so.

Thanks,

tglx