Re: [PATCH] softirq: Do not loop if running under a real-time task
From: Petr Malat
Date: Thu Jul 25 2024 - 04:44:15 EST
Hi Sebastian,
what is the status of this softirq issue? By looking on the current
upstream code, I think the problem is still there. I can resend my
patch or rework it to use independent task struct flag to decide if
softirq processing should be skipped in the current thread.
BR,
Petr
On Thu, Mar 09, 2023 at 02:03:38PM +0100, Petr Malat wrote:
> On Wed, Mar 08, 2023 at 10:14:58AM +0100, Sebastian Andrzej Siewior wrote:
> > On 2023-03-06 16:45:48 [+0100], Petr Malat wrote:
> > > Softirq processing can be a source of a scheduling jitter if it executes
> > > in a real-time task as in that case need_resched() is false unless there
> > > is another runnable task with a higher priority. This is especially bad
> > > if the softirq processing runs in a migration thread, which has priority
> > > 99 and usually runs for a short time.
> > >
> > > One option would be to not restart the softirq processing if there is
> > > another runnable task to allow the high prio task to finish and yield the
> > > CPU, the second one is to not restart if softirq executes in a real-time
> > > task. Usually, real-time tasks don't want to be interrupted, so implement
> > > the second option.
> >
> > This affects only PEEMPT_RT, right?
> I have observed the issue on 5.15 CONFIG_PREEMPT=y arm32 kernel.
>
> > I have plans to redo parts of it. You shouldn't enter ksoftirqd to be
> > begin with. There is this ktimerd in v6.1 which mitigates this to some
> > point and I plan to extend it to also cover the sched-softirq.
> > Other than that, you are right in saying that the softirq must not
> > continue with a RT prio and that need_resched() is not visible here.
> > However ksoftirqd itself must be able to do loops unless the
> > need-resched flag is seen.
> >
> > Since you mentioned migration thread, how ofter to you see this or how
> > does this trigger?
> I have seen only one occurrence, where I have a back trace available
> (from hundreds systems). I think that's because on my system it may occur
> only if it hits the migration thread, otherwise there are more runable
> threads of the same priority and need_resched() breaks the loop.
>
> I obtained the stack trace by making a debugging module which uses a
> periodic timer to monitor active tasks and it dumps stack when it finds
> something fishy. This is what I got:
> [<bf84f559>] (hogger_handler [hogger]) from [<c04850ef>] (__hrtimer_run_queues+0x13f/0x2f4)
> [<c04850ef>] (__hrtimer_run_queues) from [<c04858a5>] (hrtimer_interrupt+0xc9/0x1c4)
> [<c04858a5>] (hrtimer_interrupt) from [<c0810533>] (arch_timer_handler_phys+0x27/0x2c)
> [<c0810533>] (arch_timer_handler_phys) from [<c046de3b>] (handle_percpu_devid_irq+0x5b/0x1e4)
> [<c046de3b>] (handle_percpu_devid_irq) from [<c0469a27>] (__handle_domain_irq+0x53/0x94)
> [<c0469a27>] (__handle_domain_irq) from [<c041e501>] (axxia_gic_handle_irq+0x16d/0x1bc)
> [<c041e501>] (axxia_gic_handle_irq) from [<c0400ad3>] (__irq_svc+0x53/0x94)
> Exception stack(0xc1595ca8 to 0xc1595cf0)
> [<c0400ad3>] (__irq_svc) from [<c098e404>] (_raw_spin_unlock_irqrestore+0x1c/0x3c)
> [<c098e404>] (_raw_spin_unlock_irqrestore) from [<c0446b6d>] (try_to_wake_up+0x1d9/0x5d0)
> [<c0446b6d>] (try_to_wake_up) from [<c0483d2d>] (call_timer_fn+0x31/0x16c)
> [<c0483d2d>] (call_timer_fn) from [<c048406f>] (run_timer_softirq+0x207/0x2d4)
> [<c048406f>] (run_timer_softirq) from [<c0401293>] (__do_softirq+0xd3/0x2f8)
> [<c0401293>] (__do_softirq) from [<c042876b>] (irq_exit+0x57/0x78)
> [<c042876b>] (irq_exit) from [<c0469a2b>] (__handle_domain_irq+0x57/0x94)
> [<c0469a2b>] (__handle_domain_irq) from [<c041e501>] (axxia_gic_handle_irq+0x16d/0x1bc)
> [<c041e501>] (axxia_gic_handle_irq) from [<c0400ad3>] (__irq_svc+0x53/0x94)
> Exception stack(0xc1595e78 to 0xc1595ec0)
> [<c0400ad3>] (__irq_svc) from [<c044d37c>] (active_load_balance_cpu_stop+0x1ec/0x234)
> [<c044d37c>] (active_load_balance_cpu_stop) from [<c04ac099>] (cpu_stopper_thread+0x69/0xd8)
> [<c04ac099>] (cpu_stopper_thread) from [<c0440b53>] (smpboot_thread_fn+0x9f/0x17c)
> [<c0440b53>] (smpboot_thread_fn) from [<c043ccf9>] (kthread+0x129/0x12c)
> [<c043ccf9>] (kthread) from [<c0400131>] (ret_from_fork+0x11/0x20)
>
> I was then looking into the code how it could happen softirqs were not
> offloaded to the thread and the only explanation I have is what I described
> in the original mail.
> BR,
> Petr