Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling

From: Shrikanth Hegde
Date: Sat Jun 15 2024 - 11:05:32 EST




On 6/10/24 12:53 PM, Ankur Arora wrote:
>
_auto.
>>
>> 6.10-rc1:
>> =========
>> 10:09:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
>> 09:45:23 AM all 4.14 0.00 77.57 0.00 16.92 0.00 0.00 0.00 0.00 1.37
>> 09:45:24 AM all 4.42 0.00 77.62 0.00 16.76 0.00 0.00 0.00 0.00 1.20
>> 09:45:25 AM all 4.43 0.00 77.45 0.00 16.94 0.00 0.00 0.00 0.00 1.18
>> 09:45:26 AM all 4.45 0.00 77.87 0.00 16.68 0.00 0.00 0.00 0.00 0.99
>>
>> PREEMPT_AUTO:
>> ===========
>> 10:09:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
>> 10:09:56 AM all 3.11 0.00 72.59 0.00 21.34 0.00 0.00 0.00 0.00 2.96
>> 10:09:57 AM all 3.31 0.00 73.10 0.00 20.99 0.00 0.00 0.00 0.00 2.60
>> 10:09:58 AM all 3.40 0.00 72.83 0.00 20.85 0.00 0.00 0.00 0.00 2.92
>> 10:10:00 AM all 3.21 0.00 72.87 0.00 21.19 0.00 0.00 0.00 0.00 2.73
>> 10:10:01 AM all 3.02 0.00 72.18 0.00 21.08 0.00 0.00 0.00 0.00 3.71
>>
>> Used bcc tools hardirq and softirq to see if irq are increasing. softirq implied there are more
>> timer,sched softirq. Numbers vary between different samples, but trend seems to be similar.
>
> Yeah, the %sys is lower and %irq, higher. Can you also see where the
> increased %irq is? For instance are the resched IPIs numbers greater?

Hi Ankur,


Used mpstat -I ALL to capture this info for 20 seconds.

HARDIRQ per second:
===================
6.10:
===================
18 19 22 23 48 49 50 51 LOC BCT LOC2 SPU PMI MCE NMI WDG DBL
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
417956.86 1114642.30 1712683.65 2058664.99 0.00 0.00 18.30 0.39 31978.37 0.00 0.35 351.98 0.00 0.00 0.00 6405.54 329189.45

Preempt_auto:
===================
18 19 22 23 48 49 50 51 LOC BCT LOC2 SPU PMI MCE NMI WDG DBL
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
609509.69 1910413.99 1923503.52 2061876.33 0.00 0.00 19.14 0.30 31916.59 0.00 0.45 497.88 0.00 0.00 0.00 6825.49 88247.85

18,19,22,23 are called XIVE interrupts. These are IPI interrupts. I am not sure which type of IPI are these. will have to see why its increasing.


SOFTIRQ per second:
===================
6.10:
===================
HI TIMER NET_TX NET_RX BLOCK IRQ_POLL TASKLET SCHED HRTIMER RCU
0.00 3966.47 0.00 18.25 0.59 0.00 0.34 12811.00 0.00 9693.95

Preempt_auto:
===================
HI TIMER NET_TX NET_RX BLOCK IRQ_POLL TASKLET SCHED HRTIMER RCU
0.00 4871.67 0.00 18.94 0.40 0.00 0.25 13518.66 0.00 15732.77

Note: RCU softirq seems to increase significantly. Not sure which one triggers. still trying to figure out why.
It maybe irq triggering to softirq or softirq causing more IPI.



Also, Noticed a below config difference which gets removed in preempt auto. This happens because PREEMPTION make them as N. Made the changes in kernel/Kconfig.locks to get them
enabled. I still see the same regression in hackbench. These configs still may need attention?

6.10 | preempt auto
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y | CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK=y | ----------------------------------------------------------------------------
CONFIG_INLINE_READ_UNLOCK_IRQ=y | ----------------------------------------------------------------------------
CONFIG_INLINE_WRITE_UNLOCK=y | ----------------------------------------------------------------------------
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y | ----------------------------------------------------------------------------


>
>> 6.10-rc1:
>> =========
>> SOFTIRQ TOTAL_usecs
>> tasklet 71
>> block 145
>> net_rx 7914
>> rcu 136988
>> timer 304357
>> sched 1404497
>>
>>
>>
>> PREEMPT_AUTO:
>> ===========
>> SOFTIRQ TOTAL_usecs
>> tasklet 80
>> block 139
>> net_rx 6907
>> rcu 223508
>> timer 492767
>> sched 1794441
>>
>>
>> Would any specific setting of RCU matter for this?
>> This is what I have in config.
>
> Don't see how it could matter unless the RCU settings are changing
> between the two tests? In my testing I'm also using TREE_RCU=y,
> PREEMPT_RCU=n.
>
> Let me see if I can find a test which shows a similar trend to what you
> are seeing. And, then maybe see if tracing sched-switch might point to
> an interesting difference between x86 and powerpc.
>
>
> Thanks for all the detail.
>
> Ankur
>
>> # RCU Subsystem
>> #
>> CONFIG_TREE_RCU=y
>> # CONFIG_RCU_EXPERT is not set
>> CONFIG_TREE_SRCU=y
>> CONFIG_NEED_SRCU_NMI_SAFE=y
>> CONFIG_TASKS_RCU_GENERIC=y
>> CONFIG_NEED_TASKS_RCU=y
>> CONFIG_TASKS_RCU=y
>> CONFIG_TASKS_RUDE_RCU=y
>> CONFIG_TASKS_TRACE_RCU=y
>> CONFIG_RCU_STALL_COMMON=y
>> CONFIG_RCU_NEED_SEGCBLIST=y
>> CONFIG_RCU_NOCB_CPU=y
>> # CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set
>> # CONFIG_RCU_LAZY is not set
>> # end of RCU Subsystem
>>
>>
>> # Timers subsystem
>> #
>> CONFIG_TICK_ONESHOT=y
>> CONFIG_NO_HZ_COMMON=y
>> # CONFIG_HZ_PERIODIC is not set
>> # CONFIG_NO_HZ_IDLE is not set
>> CONFIG_NO_HZ_FULL=y
>> CONFIG_CONTEXT_TRACKING_USER=y
>> # CONFIG_CONTEXT_TRACKING_USER_FORCE is not set
>> CONFIG_NO_HZ=y
>> CONFIG_HIGH_RES_TIMERS=y
>> # end of Timers subsystem
>
>
> --
> ankur