Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
From: Catalin Marinas
Date: Wed Feb 18 2026 - 04:29:28 EST
Hi Prateek,
On Wed, Feb 18, 2026 at 09:31:19AM +0530, K Prateek Nayak wrote:
> On 2/17/2026 10:18 PM, Catalin Marinas wrote:
> > Yes, that would be good. It's the preempt_enable_notrace() path that
> > ends up calling preempt_schedule_notrace() -> __schedule() pretty much
> > unconditionally.
>
> What do you mean by unconditionally? We always check
> __preempt_count_dec_and_test() before calling into __schedule().
>
> On x86, We use MSB of preempt_count to indicate a resched and
> set_preempt_need_resched() would just clear this MSB.
>
> If the preempt_count() turns 0, we immediately go into schedule
> or or the next preempt_enable() -> __preempt_count_dec_and_test()
> would see the entire preempt_count being clear and will call into
> schedule.
>
> The arm64 implementation seems to be doing something similar too
> with a separate "ti->preempt.need_resched" bit which is part of
> the "ti->preempt_count"'s union so it isn't really unconditional.
Ah, yes, you are right. I got the polarity of need_resched in
thread_info wrong (we should have named it no_need_to_resched).
So in the common case, the overhead is caused by the additional
pointer chase and preempt_count update, on top of the cpu offset read.
Not sure we can squeeze any more cycles out of these without some
large overhaul like:
https://git.kernel.org/mark/c/84ee5f23f93d4a650e828f831da9ed29c54623c5
or Yang's per-CPU page tables. Well, there are more ideas like in-kernel
restartable sequences but they move the overhead elsewhere.
Thanks.
--
Catalin