Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

From: Uladzislau Rezki
Date: Wed May 11 2022 - 09:40:12 EST


> On Mon, 9 May 2022 20:28:26 +0200
> Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
>
> > I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> > was mainly designed for that kind of kernels. So we can align with Alison
> > patch and her decision, so i do not see any issues. So far RT folk seems
> > does not mind in having "callback-kthreads" as SCHED_FIFO :)
>
> That's because RT folks set the threads they care about to a higher RT
> priority than the kthreads. ;-)
>
That explains many things :)

I have one question, it is partly related to the topic that is in question
and to this thread also. I was tracing a "long" duration of the offloading
kthreads which actually invoke them one by one. And the picture was like
below from ftrace point of view:

<snip>
rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
...
rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
<snip>

i spent some time in order to understand why the context was not switched,
even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
was sent to the CPU_0 to reschedule. The last "." in latency field shows
that a context has not disabled any preemption. So everything should be fine.

An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
so a task becomes non preemtable but the ftrace does not provide any signal about
it. So i was fooled for some time by my tracer logs.

Do you have any thoughts about it? Should it be solved or signaled
somehow that a task in fact is not preemtable if a counter > 0?

Thanks!

--
Uladzislau Rezki