Re: [PATCH] net/9p/trans_virtio.c: replace mutex_lock with spin_lock to protect 'virtio_chan_list'

From: piaojun
Date: Thu Jul 19 2018 - 03:45:27 EST




On 2018/7/19 11:36, Dominique Martinet wrote:
> piaojun wrote on Thu, Jul 19, 2018:
>>> piaojun wrote on Wed, Jul 18, 2018:
>>> That's not a fast path operation, I don't mind changing things but I'd
>>> like to understand why - these functions are only ever called at unmount
>>> time or when something happens on the virtio bus (probe will happen on
>>> probing on the pci bus and I'm not too sure on remove but probably pci
>>> removal i.e. basically never?)
>>>
>>> I don't see why this wouldn't work, but I won't take this without a
>>> (good?) reason.
>>>
>> virtio_9p_lock is responsable for protecting virtio_chan_list which has 3
>> operation:
>>
>> 1. Add a virtio chan to virtio_chan_list. This will happen when we insmod
>> 9pnet_virtio.ko:
>> p9_virtio_probe
>> --list_add_tail(&chan->chan_list, &virtio_chan_list);
>>
>> 2. Remove a virtio chan. This will happen when remnod 9pnet_virtio.ko:
>> p9_virtio_remove
>> --list_del(&chan->chan_list);
>>
>> 3. Find a unused virtio chan when mount 9p:
>> mount
>> --p9_virtio_create
>> --list_for_each_entry(chan, &virtio_chan_list, chan_list)
>>
>> Multi mount process will compete for virtio_9p_lock when finding unused
>> virtio chan, in which case mutex lock will cause process sleep and wake
>> up. I think this a waste of CPU time. So we could use spin lock to avoid
>> this.
>
> Well, sure, that's theory; but how is that in practice?
> I actually took the time to run some tests, setting up 20 virtio mount
> points in qemu, and running this command with and without your patch:
> # time sh -c 'for i in {1..20}; do
> sh -c "for j in {1..100}; do
> mount -t 9p d$i d.$i;
> umount d.$i;
> done" &
> done;
> wait'
>
> This is quick & dirty but basically, mounts and unmounts 100 times in a
> loop all 20 mount points in parallel to stress that lock.
> I get these times 5 times (one run per column),
> without patch:
> real 0m19.357s 0m19.626s 0m19.904s 0m19.926s 0m21.321s
> user 0m6.795s 0m6.874s 0m6.807s 0m6.768s 0m6.892s
> sys 0m29.936s 0m31.196s 0m31.702s 0m31.914s 0m30.791s
>
> With patch:
> real 0m19.439s 0m19.849s 0m19.683s 0m19.600s 0m20.689s
> user 0m6.948s 0m6.582s 0m6.706s 0m6.598s 0m6.876s
> sys 0m29.364s 0m30.898s 0m30.695s 0m31.311s 0m33.391s
>
> I honestly can't say I'm convinced with a difference either way, the
> variations look more like noise than anything to me.
>
>
> More to the point, while these tests ran my dmesg buffer was filled with
> errors like:
> FS-Cache: Duplicate cookie detected
> FS-Cache: O-cookie c=0000000000368cdb [p=00000000548b03c2 fl=222 nc=0 na=1]
> FS-Cache: O-cookie d=000000004cebd15f n=00000000029a0b83
> FS-Cache: O-key=[10] '34323935303838343536'
> FS-Cache: N-cookie c=00000000d4089478 [p=00000000548b03c2 fl=2 nc=0 na=1]
> FS-Cache: N-cookie d=000000004cebd15f n=00000000959d4d37
> FS-Cache: N-key=[10] '34323935303838343536'
>
> or
> (output mangled a bit)
>
> ==================================================================
> BUG: KASAN: use-after-free in p9_client_cb+0x14d/0x160 [9pnet]
> Read of size 8 at addr ffff88003522a088 by task systemd-udevd/492
>
> CPU: 1 PID: 492 Comm: systemd-udevd Tainted: G O 4.18.0-rc5+ #9
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 0>
> Call Trace:
> <IRQ>
> dump_stack+0x7b/0xad
> print_address_description+0x6a/0x209
> ? p9_client_cb+0x14d/0x160 [9pnet]
> kasan_report.cold.7+0x242/0x2fe
> __asan_report_load8_noabort+0x19/0x20
> p9_client_cb+0x14d/0x160 [9pnet]
> req_done+0x22f/0x280 [9pnet_virtio]
> ? p9_mount_tag_show+0x120/0x120 [9pnet_virtio]
> vring_interrupt+0x108/0x1b0 [virtio_ring]
> ? vring_map_single.constprop.23+0x350/0x350 [virtio_ring]
> __handle_irq_event_percpu+0xec/0x460
> handle_irq_event_percpu+0x71/0x140
> ? __handle_irq_event_percpu+0x460/0x460
> ? apic_ack_irq+0xa3/0xe0
> handle_irq_event+0xb9/0x14a
> handle_edge_irq+0x1ea/0x7a0
> ? kasan_check_read+0x11/0x20
> handle_irq+0x48/0x60
> do_IRQ+0x67/0x140
> common_interrupt+0xf/0xf
> </IRQ>
> RIP: 0010:finish_task_switch+0x10e/0x630
> Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 6d 04 00 00 41 c7 45 38 00 00 00 00 4c 89 e7 ff 14 25 28 f5 66 8e fb 66 0f >
> RSP: 0018:ffff8800633e7a60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd4
> RAX: 0000000000000001 RBX: ffff880036632000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006caaac00
> RBP: ffff8800633e7aa0 R08: ffffed000cea15cd R09: ffffed000cea15cc
> R10: ffffed000cea15cc R11: ffff88006750ae63 R12: ffff88006caaac00
> R13: ffff88006558b000 R14: 0000000000000000 R15: ffff880036632000
> ? __switch_to_asm+0x34/0x70
> ? __switch_to_asm+0x40/0x70
> __schedule+0x733/0x1c10
> ? __bpf_prog_run64+0xd0/0xd0
> ? firmware_map_remove+0x174/0x174
> schedule+0x7a/0x1a0
> schedule_hrtimeout_range_clock+0x306/0x3b0
> ? kasan_check_write+0x14/0x20
> ? hrtimer_nanosleep_restart+0x290/0x290
> ? ep_busy_loop_end+0x110/0x110
> schedule_hrtimeout_range+0x13/0x20
> ep_poll+0x7a7/0xb50
> ? __ia32_sys_epoll_ctl+0x1170/0x1170
> ? __fget_light+0x59/0x1f0
> ? __audit_syscall_entry+0x347/0x980
> ? __audit_free+0x8a0/0x8a0
> 34
> ? wake_up_q+0x100/0x100
> 39
> ? kasan_check_read+0x11/0x20
> 3230373130'
> FS-Cache: O-key=[10] '34323934393230373131'
> FS-Cache: N-cookie c=00000000fa69c1f9 [p=00000000887326c4 fl=2 nc=0 na=1]
> FS-Cache: N-cookie d=00000000a8f143d1 n=00000000446f741a
> FS-Cache: N-key=[10] '34323934393230373131'
> ? __fget_light+0x59/0x1f0
> do_epoll_wait+0x129/0x160
> __x64_sys_epoll_wait+0x97/0xf0
> do_syscall_64+0xa5/0x260
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f9099a22317
> Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 d1 46 2c 00 41 89 ca 8b 00 85 c0 75 10 b8 e8 00 >
> RSP: 002b:00007ffff67e1f28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
> RAX: ffffffffffffffda RBX: 0000558182d9e390 RCX: 00007f9099a22317
> RDX: 000000000000000b RSI: 00007ffff67e1f30 RDI: 000000000000000b
> RBP: 00007ffff67e20b0 R08: 0000000006c65ded R09: 00007ffff67e1f30
> R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
> R13: 00007ffff67e1f30 R14: ffffffffffffffff R15: 0000558182d7a4c0
>
> Allocated by task 6390:
> save_stack+0x43/0xd0
> kasan_kmalloc+0xc4/0xe0
> kasan_slab_alloc+0x12/0x20
> kmem_cache_alloc+0xe2/0x5e0
> p9_client_prepare_req+0xa4/0x670 [9pnet]
> p9_client_rpc+0x133/0xd20 [9pnet]
> p9_client_getattr_dotl+0x102/0x910 [9pnet]
> v9fs_mount+0x5a6/0x7c0 [9p]
> mount_fs+0x89/0x2ad
> vfs_kern_mount.part.32+0x5d/0x390
> do_mount+0x379/0x2bb0
> ksys_mount+0xbf/0xe0
> __x64_sys_mount+0xbe/0x150
> do_syscall_64+0xa5/0x260
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Freed by task 6390:
> save_stack+0x43/0xd0
> __kasan_slab_free+0x118/0x170
> kasan_slab_free+0xe/0x10
> kmem_cache_free+0x49/0x160
> p9_free_req+0x106/0x140 [9pnet]
> p9_client_getattr_dotl+0x590/0x910 [9pnet]
> v9fs_mount+0x5a6/0x7c0 [9p]
> mount_fs+0x89/0x2ad
> vfs_kern_mount.part.32+0x5d/0x390
> do_mount+0x379/0x2bb0
> ksys_mount+0xbf/0xe0
> __x64_sys_mount+0xbe/0x150
> do_syscall_64+0xa5/0x260
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> The buggy address belongs to the object at ffff88003522a068
> which belongs to the cache p9_req_t of size 72
> The buggy address is located 32 bytes inside of
> 72-byte region [ffff88003522a068, ffff88003522a0b0)
> The buggy address belongs to the page:
> page:ffffea0000d48a80 count:1 mapcount:0 mapping:ffff880064562580 index:0x0
> flags: 0xffffc000000100(slab)
> raw: 00ffffc000000100 ffff880035e36618 ffffea00019fa888 ffff880064562580
> raw: 0000000000000000 ffff88003522a000 0000000100000027 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff880035229f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff88003522a000: fb fb fb fb fb fb fb fb fb fc fc fc fc fb fb fb
>> ffff88003522a080: fb fb fb fb fb fb fc fc fc fc fb fb fb fb fb fb
> ^
> ffff88003522a100: fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb fb
> ffff88003522a180: fc fc fc fc fb fb fb fb fb fb fb fb fb fc fc fc
> ==================================================================
>
> so if you're concerned about parallel mountings, I think there are
> others, more important, bugs to fix rather than replacing a hardly-used
> mutex by a spin-lock...
>
It makes sense, and bug fix comes first. I will look into the bug you tested.

Thanks,
Jun

>
>
> You've done the work now so it's not like I can't take the patch, but it
> really feels pointless to me unless you can show me there is actual
> improvement.
>