Re: [PATCH 2/3] softirq: avoid spurious stalls due to need_resched()

From: Paul E. McKenney
Date: Fri Mar 03 2023 - 20:25:44 EST


On Fri, Mar 03, 2023 at 03:44:13PM -0800, Jakub Kicinski wrote:
> On Fri, 3 Mar 2023 15:36:27 -0800 Paul E. McKenney wrote:
> > On Fri, Mar 03, 2023 at 02:37:39PM -0800, Paul E. McKenney wrote:
> > > On Fri, Mar 03, 2023 at 01:31:43PM -0800, Jakub Kicinski wrote:
> > > > Now - now about the max loop count. I ORed the pending softirqs every
> > > > time we get to the end of the loop. Looks like vast majority of the
> > > > loop counter wake ups are exclusively due to RCU:
> > > >
> > > > @looped[512]: 5516
> > > >
> > > > Where 512 is the ORed pending mask over all iterations
> > > > 512 == 1 << RCU_SOFTIRQ.
> > > >
> > > > And they usually take less than 100us to consume the 10 iterations.
> > > > Histogram of usecs consumed when we run out of loop iterations:
> > > >
> > > > [16, 32) 3 | |
> > > > [32, 64) 4786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > > > [64, 128) 871 |@@@@@@@@@ |
> > > > [128, 256) 34 | |
> > > > [256, 512) 9 | |
> > > > [512, 1K) 262 |@@ |
> > > > [1K, 2K) 35 | |
> > > > [2K, 4K) 1 | |
> > > >
> > > > Paul, is this expected? Is RCU not trying too hard to be nice?
> > >
> > > This is from way back in the day, so it is quite possible that better
> > > tuning and/or better heuristics should be applied.
> > >
> > > On the other hand, 100 microseconds is a good long time from an
> > > CONFIG_PREEMPT_RT=y perspective!
> > >
> > > > # cat /sys/module/rcutree/parameters/blimit
> > > > 10
> > > >
> > > > Or should we perhaps just raise the loop limit? Breaking after less
> > > > than 100usec seems excessive :(
> > >
> > > But note that RCU also has rcutree.rcu_divisor, which defaults to 7.
> > > And an rcutree.rcu_resched_ns, which defaults to three milliseconds
> > > (3,000,000 nanoseconds). This means that RCU will do:
> > >
> > > o All the callbacks if there are less than ten.
> > >
> > > o Ten callbacks or 1/128th of them, whichever is larger.
> > >
> > > o Unless the larger of them is more than 100 callbacks, in which
> > > case there is an additional limit of three milliseconds worth
> > > of them.
> > >
> > > Except that if a given CPU ends up with more than 10,000 callbacks
> > > (rcutree.qhimark), that CPU's blimit is set to 10,000.
> >
> > Also, if in the context of a softirq handler (as opposed to ksoftirqd)
> > that interrupted the idle task with no pending task, the count of
> > callbacks is ignored and only the 3-millisecond limit counts. In the
> > context of ksoftirq, the only limit is that which the scheduler chooses
> > to impose.
> >
> > But it sure seems like the ksoftirqd case should also pay attention to
> > that 3-millisecond limit. I will queue a patch to that effect, and maybe
> > Eric Dumazet will show me the error of my ways.
>
> Just to be sure - have you seen Peter's patches?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/softirq
>
> I think it feeds the time limit to the callback from softirq,
> so the local 3ms is no more?

I might or might not have back in September of 2020. ;-)

But either way, the question remains: Should RCU_SOFTIRQ do time checking
in ksoftirqd context? Seems like the answer should be "yes", independently
of Peter's patches.

Thanx, Paul