Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling

From: Shrikanth Hegde
Date: Thu Jun 27 2024 - 11:45:36 EST




On 6/27/24 11:26 AM, Michael Ellerman wrote:
> Ankur Arora <ankur.a.arora@xxxxxxxxxx> writes:
>> Shrikanth Hegde <sshegde@xxxxxxxxxxxxx> writes:
>>> ...
>>> This was the patch which I tried to make it per cpu for powerpc: It boots and runs workload.
>>> Implemented a simpler one instead of folding need resched into preempt count. By hacky way avoided
>>> tif_need_resched calls as didnt affect the throughput. Hence kept it simple. Below is the patch
>>> for reference. It didn't help fix the regression unless I implemented it wrongly.
>>>
>>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>>> index 1d58da946739..374642288061 100644
>>> --- a/arch/powerpc/include/asm/paca.h
>>> +++ b/arch/powerpc/include/asm/paca.h
>>> @@ -268,6 +268,7 @@ struct paca_struct {
>>> u16 slb_save_cache_ptr;
>>> #endif
>>> #endif /* CONFIG_PPC_BOOK3S_64 */
>>> + int preempt_count;
>>
>> I don't know powerpc at all. But, would this cacheline be hotter
>> than current_thread_info()::preempt_count?
>>
>>> #ifdef CONFIG_STACKPROTECTOR
>>> unsigned long canary;
>>> #endif
>
> Assuming stack protector is enabled (it is in defconfig), that cache
> line should quite be hot, because the canary is loaded as part of the
> epilogue of many functions.

Thanks Michael for taking a look at it.

Yes. CONFIG_STACKPROTECTOR=y
which cacheline is a question still if we are going to pursue this.
> Putting preempt_count in the paca also means it's a single load/store to
> access the value, just paca (in r13) + static offset. With the
> preempt_count in thread_info it's two loads, one to load current from
> the paca and then another to get the preempt_count.
>
> It could be worthwhile to move preempt_count into the paca, but I'm not
> convinced preempt_count is accessed enough for it to be a major
> performance issue.

With PREEMPT_COUNT enabled, this would mean for every preempt_enable/disable.
That means for every spin lock/unlock, get/set cpu etc. Those might be
quite frequent. no? But w.r.t to preempt auto it didn't change the performance per se.

>
> cheers