Re: [PATCH V4] Softirq:avoid large sched delay from the pending softirqs

From: jun qian
Date: Mon Sep 14 2020 - 22:09:45 EST


Thomas Gleixner <tglx@xxxxxxxxxxxxx> 于2020年7月27日周一 下午11:41写道:
>
> Qian,
>
> qianjun.kernel@xxxxxxxxx writes:
> > /*
> > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> > * but break the loop if need_resched() is set or after 2 ms.
> > - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> > - * certain cases, such as stop_machine(), jiffies may cease to
> > - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> > - * well to make sure we eventually return from this method.
> > + * In the loop, if the processing time of the softirq has exceeded 2
> > + * milliseconds, we also need to break the loop to wakeup the
> > ksofirqd.
>
> You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
> have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
> gets adjusted later on. Also while sched_clock() is granular on many
> systems it still can be jiffies based and then the above problem
> persists.
>
> > @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> > }
> > h++;
> > pending >>= softirq_bit;
> > +
> > + /*
> > + * the softirq's action has been running for too much time
> > + * so it may need to wakeup the ksoftirqd
> > + */
> > + if (need_resched() && sched_clock() > end) {
> > + /*
> > + * Ensure that the remaining pending bits are
> > + * handled.
> > + */
> > + or_softirq_pending(pending << (vec_nr + 1));
>
> To or the value interrupts need to be disabled because otherwise you can
> lose a bit when an interrupt happens in the middle of the RMW operation
> and raises a softirq which is not in @pending and not in the per CPU
> local softirq pending storage.
>
> There is another problem. Assume bit 0 and 1 are pending when the
> processing starts. Now it breaks out after bit 0 has been handled and
> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> again. ksoftirqd runs and handles bit 0, which takes more than the
> timeout. As a result the bit 0 processing can starve all other softirqs.
>
> So this needs more thought.
HI Thomas, The problem as you described, i am trying to solve it, but
i found the
other problem,just like this,for example

the pending bits is 1010110110, when the bit4 was handled, and the
loop processing
time more than 2ms, in my patch, the loop will break early. In the
next time, the loop
will process the bit5, if before the soft action, the
bit6,bit8,bit0,bit3 were set, the loop
will process the bit5,6,7,8,9,0,1,2,3, that will be the bit6 and bit8
wil be processed before
bit0 and bit3.

Do we need to consider this scenario. Do you have any good suggestions.

Thanks.
>
> Thanks,
>
> tglx