Re: [PATCH] net/9p/trans_virtio.c: replace mutex_lock with spin_lock to protect 'virtio_chan_list'

From: Dominique Martinet
Date: Wed Jul 18 2018 - 23:36:33 EST


piaojun wrote on Thu, Jul 19, 2018:
> > piaojun wrote on Wed, Jul 18, 2018:
> > That's not a fast path operation, I don't mind changing things but I'd
> > like to understand why - these functions are only ever called at unmount
> > time or when something happens on the virtio bus (probe will happen on
> > probing on the pci bus and I'm not too sure on remove but probably pci
> > removal i.e. basically never?)
> >
> > I don't see why this wouldn't work, but I won't take this without a
> > (good?) reason.
> >
> virtio_9p_lock is responsable for protecting virtio_chan_list which has 3
> operation:
>
> 1. Add a virtio chan to virtio_chan_list. This will happen when we insmod
> 9pnet_virtio.ko:
> p9_virtio_probe
> --list_add_tail(&chan->chan_list, &virtio_chan_list);
>
> 2. Remove a virtio chan. This will happen when remnod 9pnet_virtio.ko:
> p9_virtio_remove
> --list_del(&chan->chan_list);
>
> 3. Find a unused virtio chan when mount 9p:
> mount
> --p9_virtio_create
> --list_for_each_entry(chan, &virtio_chan_list, chan_list)
>
> Multi mount process will compete for virtio_9p_lock when finding unused
> virtio chan, in which case mutex lock will cause process sleep and wake
> up. I think this a waste of CPU time. So we could use spin lock to avoid
> this.

Well, sure, that's theory; but how is that in practice?
I actually took the time to run some tests, setting up 20 virtio mount
points in qemu, and running this command with and without your patch:
# time sh -c 'for i in {1..20}; do
sh -c "for j in {1..100}; do
mount -t 9p d$i d.$i;
umount d.$i;
done" &
done;
wait'

This is quick & dirty but basically, mounts and unmounts 100 times in a
loop all 20 mount points in parallel to stress that lock.
I get these times 5 times (one run per column),
without patch:
real 0m19.357s 0m19.626s 0m19.904s 0m19.926s 0m21.321s
user 0m6.795s 0m6.874s 0m6.807s 0m6.768s 0m6.892s
sys 0m29.936s 0m31.196s 0m31.702s 0m31.914s 0m30.791s

With patch:
real 0m19.439s 0m19.849s 0m19.683s 0m19.600s 0m20.689s
user 0m6.948s 0m6.582s 0m6.706s 0m6.598s 0m6.876s
sys 0m29.364s 0m30.898s 0m30.695s 0m31.311s 0m33.391s

I honestly can't say I'm convinced with a difference either way, the
variations look more like noise than anything to me.


More to the point, while these tests ran my dmesg buffer was filled with
errors like:
FS-Cache: Duplicate cookie detected
FS-Cache: O-cookie c=0000000000368cdb [p=00000000548b03c2 fl=222 nc=0 na=1]
FS-Cache: O-cookie d=000000004cebd15f n=00000000029a0b83
FS-Cache: O-key=[10] '34323935303838343536'
FS-Cache: N-cookie c=00000000d4089478 [p=00000000548b03c2 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=000000004cebd15f n=00000000959d4d37
FS-Cache: N-key=[10] '34323935303838343536'

or
(output mangled a bit)

==================================================================
BUG: KASAN: use-after-free in p9_client_cb+0x14d/0x160 [9pnet]
Read of size 8 at addr ffff88003522a088 by task systemd-udevd/492

CPU: 1 PID: 492 Comm: systemd-udevd Tainted: G O 4.18.0-rc5+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 0>
Call Trace:
<IRQ>
dump_stack+0x7b/0xad
print_address_description+0x6a/0x209
? p9_client_cb+0x14d/0x160 [9pnet]
kasan_report.cold.7+0x242/0x2fe
__asan_report_load8_noabort+0x19/0x20
p9_client_cb+0x14d/0x160 [9pnet]
req_done+0x22f/0x280 [9pnet_virtio]
? p9_mount_tag_show+0x120/0x120 [9pnet_virtio]
vring_interrupt+0x108/0x1b0 [virtio_ring]
? vring_map_single.constprop.23+0x350/0x350 [virtio_ring]
__handle_irq_event_percpu+0xec/0x460
handle_irq_event_percpu+0x71/0x140
? __handle_irq_event_percpu+0x460/0x460
? apic_ack_irq+0xa3/0xe0
handle_irq_event+0xb9/0x14a
handle_edge_irq+0x1ea/0x7a0
? kasan_check_read+0x11/0x20
handle_irq+0x48/0x60
do_IRQ+0x67/0x140
common_interrupt+0xf/0xf
</IRQ>
RIP: 0010:finish_task_switch+0x10e/0x630
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 6d 04 00 00 41 c7 45 38 00 00 00 00 4c 89 e7 ff 14 25 28 f5 66 8e fb 66 0f >
RSP: 0018:ffff8800633e7a60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd4
RAX: 0000000000000001 RBX: ffff880036632000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006caaac00
RBP: ffff8800633e7aa0 R08: ffffed000cea15cd R09: ffffed000cea15cc
R10: ffffed000cea15cc R11: ffff88006750ae63 R12: ffff88006caaac00
R13: ffff88006558b000 R14: 0000000000000000 R15: ffff880036632000
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
__schedule+0x733/0x1c10
? __bpf_prog_run64+0xd0/0xd0
? firmware_map_remove+0x174/0x174
schedule+0x7a/0x1a0
schedule_hrtimeout_range_clock+0x306/0x3b0
? kasan_check_write+0x14/0x20
? hrtimer_nanosleep_restart+0x290/0x290
? ep_busy_loop_end+0x110/0x110
schedule_hrtimeout_range+0x13/0x20
ep_poll+0x7a7/0xb50
? __ia32_sys_epoll_ctl+0x1170/0x1170
? __fget_light+0x59/0x1f0
? __audit_syscall_entry+0x347/0x980
? __audit_free+0x8a0/0x8a0
34
? wake_up_q+0x100/0x100
39
? kasan_check_read+0x11/0x20
3230373130'
FS-Cache: O-key=[10] '34323934393230373131'
FS-Cache: N-cookie c=00000000fa69c1f9 [p=00000000887326c4 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=00000000a8f143d1 n=00000000446f741a
FS-Cache: N-key=[10] '34323934393230373131'
? __fget_light+0x59/0x1f0
do_epoll_wait+0x129/0x160
__x64_sys_epoll_wait+0x97/0xf0
do_syscall_64+0xa5/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f9099a22317
Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 d1 46 2c 00 41 89 ca 8b 00 85 c0 75 10 b8 e8 00 >
RSP: 002b:00007ffff67e1f28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
RAX: ffffffffffffffda RBX: 0000558182d9e390 RCX: 00007f9099a22317
RDX: 000000000000000b RSI: 00007ffff67e1f30 RDI: 000000000000000b
RBP: 00007ffff67e20b0 R08: 0000000006c65ded R09: 00007ffff67e1f30
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffff67e1f30 R14: ffffffffffffffff R15: 0000558182d7a4c0

Allocated by task 6390:
save_stack+0x43/0xd0
kasan_kmalloc+0xc4/0xe0
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0xe2/0x5e0
p9_client_prepare_req+0xa4/0x670 [9pnet]
p9_client_rpc+0x133/0xd20 [9pnet]
p9_client_getattr_dotl+0x102/0x910 [9pnet]
v9fs_mount+0x5a6/0x7c0 [9p]
mount_fs+0x89/0x2ad
vfs_kern_mount.part.32+0x5d/0x390
do_mount+0x379/0x2bb0
ksys_mount+0xbf/0xe0
__x64_sys_mount+0xbe/0x150
do_syscall_64+0xa5/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 6390:
save_stack+0x43/0xd0
__kasan_slab_free+0x118/0x170
kasan_slab_free+0xe/0x10
kmem_cache_free+0x49/0x160
p9_free_req+0x106/0x140 [9pnet]
p9_client_getattr_dotl+0x590/0x910 [9pnet]
v9fs_mount+0x5a6/0x7c0 [9p]
mount_fs+0x89/0x2ad
vfs_kern_mount.part.32+0x5d/0x390
do_mount+0x379/0x2bb0
ksys_mount+0xbf/0xe0
__x64_sys_mount+0xbe/0x150
do_syscall_64+0xa5/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88003522a068
which belongs to the cache p9_req_t of size 72
The buggy address is located 32 bytes inside of
72-byte region [ffff88003522a068, ffff88003522a0b0)
The buggy address belongs to the page:
page:ffffea0000d48a80 count:1 mapcount:0 mapping:ffff880064562580 index:0x0
flags: 0xffffc000000100(slab)
raw: 00ffffc000000100 ffff880035e36618 ffffea00019fa888 ffff880064562580
raw: 0000000000000000 ffff88003522a000 0000000100000027 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880035229f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88003522a000: fb fb fb fb fb fb fb fb fb fc fc fc fc fb fb fb
>ffff88003522a080: fb fb fb fb fb fb fc fc fc fc fb fb fb fb fb fb
^
ffff88003522a100: fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb fb
ffff88003522a180: fc fc fc fc fb fb fb fb fb fb fb fb fb fc fc fc
==================================================================

so if you're concerned about parallel mountings, I think there are
others, more important, bugs to fix rather than replacing a hardly-used
mutex by a spin-lock...



You've done the work now so it's not like I can't take the patch, but it
really feels pointless to me unless you can show me there is actual
improvement.

--
Dominique