Re: [PATCH v3 07/11] mm: vmalloc: Offload free_vmap_area_lock lock
From: Uladzislau Rezki
Date: Fri Mar 22 2024 - 15:03:17 EST
On Fri, Mar 22, 2024 at 11:21:02AM -0700, Guenter Roeck wrote:
> Hi,
>
> On Tue, Jan 02, 2024 at 07:46:29PM +0100, Uladzislau Rezki (Sony) wrote:
> > Concurrent access to a global vmap space is a bottle-neck.
> > We can simulate a high contention by running a vmalloc test
> > suite.
> >
> > To address it, introduce an effective vmap node logic. Each
> > node behaves as independent entity. When a node is accessed
> > it serves a request directly(if possible) from its pool.
> >
> > This model has a size based pool for requests, i.e. pools are
> > serialized and populated based on object size and real demand.
> > A maximum object size that pool can handle is set to 256 pages.
> >
> > This technique reduces a pressure on the global vmap lock.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
>
> This patch results in a persistent "spinlock bad magic" message
> when booting s390 images with spinlock debugging enabled.
>
> [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
> [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
> [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108
> [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40
> [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150
> [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118
> [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68
> [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708
> [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8
> [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40
>
> Bisect results and decoded stacktrace below.
>
> The uninitialized spinlock is &vn->busy.lock.
> Debugging shows that this lock is actually never initialized.
>
It is. Once the vmalloc_init() "main entry" function is called from the:
<snip>
start_kernel()
mm_core_init()
vmalloc_init()
<snip>
> [ 0.464684] ####### locking 0000000002280fb8
> [ 0.464862] BUG: spinlock bad magic on CPU#0, swapper/0
> ...
> [ 0.464684] ####### locking 0000000002280fb8
> [ 0.477479] ####### locking 0000000002280fb8
> [ 0.478166] ####### locking 0000000002280fb8
> [ 0.478218] ####### locking 0000000002280fb8
> ...
> [ 0.718250] #### busy lock init 0000000002871860
> [ 0.718328] #### busy lock init 00000000028731b8
>
> Only the initialized locks are used after the call to vmap_init_nodes().
>
Right, when the vmap space and vmalloc is initialized.
> Guenter
>
> ---
> # bad: [8e938e39866920ddc266898e6ae1fffc5c8f51aa] Merge tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
> # good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
> git bisect start 'HEAD' 'v6.8'
> # good: [e56bc745fa1de77abc2ad8debc4b1b83e0426c49] smb311: additional compression flag defined in updated protocol spec
> git bisect good e56bc745fa1de77abc2ad8debc4b1b83e0426c49
> # bad: [902861e34c401696ed9ad17a54c8790e7e8e3069] Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad 902861e34c401696ed9ad17a54c8790e7e8e3069
> # good: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
> git bisect good 480e035fc4c714fb5536e64ab9db04fedc89e910
> # good: [fe46a7dd189e25604716c03576d05ac8a5209743] Merge tag 'sound-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
> git bisect good fe46a7dd189e25604716c03576d05ac8a5209743
> # bad: [435a75548109f19e5b5b14ae35b9acb063c084e9] mm: use folio more widely in __split_huge_page
> git bisect bad 435a75548109f19e5b5b14ae35b9acb063c084e9
> # good: [4d5bf0b6183f79ea361dd506365d2a471270735c] mm/mmu_gather: add tlb_remove_tlb_entries()
> git bisect good 4d5bf0b6183f79ea361dd506365d2a471270735c
> # bad: [4daacfe8f99f4b4cef562649d56c48642981f46e] mm/damon/sysfs-schemes: support PSI-based quota auto-tune
> git bisect bad 4daacfe8f99f4b4cef562649d56c48642981f46e
> # good: [217b2119b9e260609958db413876f211038f00ee] mm,page_owner: implement the tracking of the stacks count
> git bisect good 217b2119b9e260609958db413876f211038f00ee
> # bad: [40254101d87870b2e5ac3ddc28af40aa04c48486] arm64, crash: wrap crash dumping code into crash related ifdefs
> git bisect bad 40254101d87870b2e5ac3ddc28af40aa04c48486
> # bad: [53becf32aec1c8049b854f0c31a11df5ed75df6f] mm: vmalloc: support multiple nodes in vread_iter
> git bisect bad 53becf32aec1c8049b854f0c31a11df5ed75df6f
> # good: [7fa8cee003166ef6db0bba70d610dbf173543811] mm: vmalloc: move vmap_init_free_space() down in vmalloc.c
> git bisect good 7fa8cee003166ef6db0bba70d610dbf173543811
> # good: [282631cb2447318e2a55b41a665dbe8571c46d70] mm: vmalloc: remove global purge_vmap_area_root rb-tree
> git bisect good 282631cb2447318e2a55b41a665dbe8571c46d70
> # bad: [96aa8437d169b8e030a98e2b74fd9a8ee9d3be7e] mm: vmalloc: add a scan area of VA only once
> git bisect bad 96aa8437d169b8e030a98e2b74fd9a8ee9d3be7e
> # bad: [72210662c5a2b6005f6daea7fe293a0dc573e1a5] mm: vmalloc: offload free_vmap_area_lock lock
> git bisect bad 72210662c5a2b6005f6daea7fe293a0dc573e1a5
> # first bad commit: [72210662c5a2b6005f6daea7fe293a0dc573e1a5] mm: vmalloc: offload free_vmap_area_lock lock
>
> ---
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] dump_stack_lvl (lib/dump_stack.c:117)
> [ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
> [ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
> [ 0.466572] find_vm_area (mm/vmalloc.c:3150)
> [ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
> [ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
> [ 0.466651] paging_init (arch/s390/mm/init.c:97)
> [ 0.466677] setup_arch (arch/s390/kernel/setup.c:972)
> [ 0.466702] start_kernel (init/main.c:899)
> [ 0.466727] startup_continue (arch/s390/kernel/head64.S:35)
> [ 0.466811] INFO: lockdep is turned off.
>
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 22aa63f4ef63..0d77d171b5d9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2343,6 +2343,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
struct vmap_area *va;
int i, j;
+ if (unlikely(!vmap_initialized))
+ return NULL;
+
/*
* An addr_to_node_id(addr) converts an address to a node index
* where a VA is located. If VA spans several zones and passed
<snip>
Could you please test it?
--
Uladzislau Rezki