Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT
From: Sebastian Andrzej Siewior
Date: Fri Mar 13 2026 - 04:18:50 EST
On 2026-03-12 20:47:53 [+0800], Xin Zhao wrote:
> hi, Sebastian
Hi Xin,
> >
> > local_bh_disable() becomes an implicit RCU read lock section on
> > !PREEMPT_RT and be must preserve the semantic.
>
> My current understanding of the statement "local_bh_disable() becomes an implicit RCU read lock
> section on !PREEMPT_RT" is as follows:
> In a regular Linux system, during the period of local_bh_disable, both preemption and soft
> interrupts are disabled, so RCU callbacks cannot be executed. This effectively means that the
> progress of the RCU grace period is stalled during the bh disable period. In a PREEMPT_RT system,
No, that is not it. A preempt-disabled section can be interrupted by a
softirq. You can also offload the RCU-callbacks from CPU1 to CPU0 at
which point you can run RCU callbacks while CPU1 has interrupts off.
This has not always been like that but this is what we have now.
> RCU callbacks are executed in an RCU context and are not protected by bh disable, so it is
> necessary to explicitly mark the RCU read lock state.
> I don't know if my understanding is correct.
You do call_rcu() at which point the pointer/ callback lands on a list
for clean up. At this point if you can observe the pointer you need to
be in a RCU section to delay processing of the list, delaying the grace
period. If the grace period starts, all new callbacks land on a new
list. In order to process "the previous" list, the rcu_read_lock()
counter needs to get to zero and every CPU needs to schedule once. This
ensures that you are not in a preempt_disable() section (or
bh_disable(), or spin_lock(), or irq off). I think this is correct but
very compressed.
The requirement that preempt_disable() needs to be considered is very
old and comes from "classic RCU" where rcu_read_lock() was
preempt_disable() and some people just did preempt_disable() because
this was the thing before rcu_read_lock() was introduced. And then
spin_lock() did the same, some people rely on it so it needs to be
preserved.
> > Ideally if task X queues soft interrupts, it handles them and a later
> > task does not observe them. Only a task with higher priority can add
> > additional softirq work.
> > If task X queues BLOCK and gets preempted, task Y with higher priority
> > adds NET_RX, then task Y will handle NET_RX and BLOCK. This can be
> > avoided by handling the softirqs per-task.
>
> It does sound like it can optimize quite a lot. By the way, does per-task
> manner calls softirq callbacks just before voluntary switch out or trigger
> task_work before return to user-space?
It wouldn't change much. It would run on the exit from most outer BH
section, on local_bh_enable() like it is now. The difference would be
only that callbacks raised in this callbacks would be executed. So say,
TaskX raises NET_RX and BLOCK, gets preempted, TaskY gets in, raises
NET_RX and TASKLET. TaskY runs now callbacks and will do only NET_RX and
TASKLET. Then we schedule back to taskX which does NET_RX (empty) and
BLOCK.
Currently, TaskY would also do BLOCK.
> > I am not a big fan of the BH workqueues because you queue work items in
> > context in which it originates and then it "vanishes". So all the
> > priorities and so on are gone. Also the work from lower priority tasks
> > gets mixed with high priority tasks. Not something you desire in
> > general.
> > In general you are better off remaining in the threaded interrupt,
> > completing the work.
>
> Indeed, if we queue the soft interrupts triggered by different priority
> tasks into a single workqueue, it wouldn't be very appropriate. If we
> want to queue them into a bottom-half (bh) workqueue, we would also need
> to create a corresponding workqueue for each priority and queue based on
> that priority. I previously developed a patch for a real-time workqueue,
> which has been used in our project. If certain soft interrupt tasks are
> very important and do not require CPU affinity, then queuing them on
> other CPUs to execute according to the actual priority needed might
> optimize performance to some extent from a real-time perspective.
>
> https://lore.kernel.org/lkml/20251205125445.4154667-1-jackzxcui1989@xxxxxxx/
This encodes too much application logic. Having a kthread for a "thing"
is usually better. You can run that kthread either per-CPU or "unbound"
and let userland deal with it by either pinning it to a CPU and/ or
adjusting its priority based on his setup. So it can be more important
than network or less important (depending if your critical real time
work is network related or not).
> Xin Zhao
Sebastian