Re: [linus:master] [x86] 4817f70c25: stress-ng.mmapaddr.ops_per_sec 63.0% regression
From: Rik van Riel
Date: Wed Jan 29 2025 - 10:23:56 EST
On Wed, 2025-01-29 at 16:14 +0800, Qi Zheng wrote:
> On 2025/1/29 02:35, Rik van Riel wrote:
> >
> > That looks like the RCU freeing somehow bypassing the
> > per-cpu-pages, and hitting the zone->lock at page free
> > time, while regular freeing usually puts pages in the
> > CPU-local free page cache, without the lock?
>
> Take the following call stack as an example:
>
> @[
> _raw_spin_unlock_irqrestore+5
> free_one_page+85
> tlb_remove_table_rcu+140
> rcu_do_batch+424
> rcu_core+401
> handle_softirqs+204
> irq_exit_rcu+208
> sysvec_apic_timer_interrupt+113
> asm_sysvec_apic_timer_interrupt+26
> _raw_spin_unlock_irqrestore+29
> get_page_from_freelist+2014
> __alloc_frozen_pages_noprof+364
> alloc_pages_mpol+123
> alloc_pages_noprof+14
> get_free_pages_noprof+17
> __x64_sys_mincore+141
> do_syscall_64+98
> entry_SYSCALL_64_after_hwframe+118
> , stress-ng-mmapa]: 5301
>
> It looks like the following happened:
>
> get_page_from_freelist
> --> rmqueue
> --> rmqueue_pcplist
> --> pcp_spin_trylock (hold the pcp lock)
> __rmqueue_pcplist
> --> rmqueue_bulk
> --> spin_lock_irqsave(&zone->lock)
> __rmqueue
> spin_unlock_irqrestore(&zone->lock)
>
> <run softirq at this time>
>
> tlb_remove_table_rcu
> --> free_frozen_pages
> --> pcp = pcp_spin_trylock (failed!!!)
> if (!pcp)
> free_one_page
>
> It seems that the pcp lock is held when doing tlb_remove_table_rcu(),
> so
> trylock fails, then bypassing PCP and calling free_one_page()
> directly,
> which leads to the hot spot of zone lock.
>
> As for the regular freeing, since the freeing operation will not be
> performed in the softirq, the above situation will not occur.
>
> Right?
You are absolutely right!
This raises an interesting question: should we keep
RCU from running callbacks while the pcp_spinlock is
held, and what would be the best way to do that?
Are there other corner cases where RCU callbacks
should not be running from softirq context at
irq reenable time?
Should maybe the RCU callbacks only run when
the current process has no locks held,
or should they simply always run from some
kernel thread?
I'm really not sure what the right answer is...
--
All Rights Reversed.