Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)

From: asmadeus
Date: Tue Apr 12 2022 - 19:24:53 EST


Christian Schoenebeck wrote on Mon, Apr 11, 2022 at 03:41:45PM +0200:
> I get more convinced that it's a bug on Linux kernel side. When guest is
> booted I always immediately get a read("/var/log/wtmp") = EBADF error on
> guest. And the 9p command sequence sent to QEMU 9p server were:

Yes, I'm not pointing fingers, just trying to understand :)

>
> ...
> v9fs_clunk tag 0 id 120 fid 568
> v9fs_walk tag 0 id 110 fid 1 newfid 568 nwnames 1
> v9fs_rerror tag 0 id 110 err 2
> v9fs_walk tag 0 id 110 fid 26 newfid 568 nwnames 1
> v9fs_rerror tag 0 id 110 err 2
> v9fs_readlink tag 0 id 22 fid 474
> v9fs_readlink_return tag 0 id 22 name /run
> v9fs_readlink tag 0 id 22 fid 474
> v9fs_readlink_return tag 0 id 22 name /run
> v9fs_readlink tag 0 id 22 fid 474
> v9fs_readlink_return tag 0 id 22 name /run
> v9fs_readlink tag 0 id 22 fid 474
> v9fs_readlink_return tag 0 id 22 name /run
> v9fs_walk tag 0 id 110 fid 633 newfid 568 nwnames 1
> v9fs_rerror tag 0 id 110 err 2
> v9fs_walk tag 0 id 110 fid 875 newfid 568 nwnames 0
> v9fs_walk_return tag 0 id 110 nwnames 0 qids (nil)
> v9fs_open tag 0 id 12 fid 568 mode 32769
> v9fs_open_return tag 0 id 12 qid={type 0 version 0 path 820297} iounit 507904
> v9fs_walk tag 0 id 110 fid 875 newfid 900 nwnames 0
> v9fs_walk_return tag 0 id 110 nwnames 0 qids (nil)
> v9fs_open tag 0 id 12 fid 900 mode 2
> v9fs_open_return tag 0 id 12 qid={type 0 version 0 path 820297} iounit 507904
> v9fs_lock tag 0 id 52 fid 568 type 1 start 0 length 0
> v9fs_lock_return tag 0 id 52 status 0
> v9fs_xattrwalk tag 0 id 30 fid 568 newfid 901 name security.capability
> v9fs_rerror tag 0 id 30 err 95
> v9fs_read tag 0 id 116 fid 568 off 192512 max_count 256
>
> So guest opens /var/log/wtmp with fid=568 mode=32769, which is write-only
> mode, and then it tries to read that fid 568, which eventually causes the
> read() call on host to error with EBADF. Which makes sense, as the file was
> opened in write-only mode, hence read() is not possible with that file
> descriptor.

Oh! That's something we can work on. the vfs code has different caches
for read only and read-write fids, perhaps the new netfs code just used
the wrong one somewhere. I'll have a look.

> The other things I noticed when looking at the 9p command sequence above:
> there is a Twalk on fid 568 before, which is not clunked before reusing fid
> 568 with Topen later. And before that Twalk on fid 568 there is a Tclunk on
> fid 568, but apparently that fid was not used before.

This one though is just weird, I don't see where linux would make up a fid to
clunk like this... Could messages be ordered a bit weird through
multithreading?
e.g. thread 1 opens, thread 2 clunks almost immediately afterwards, and
would be printed the other way around?
Should still be serialized through the virtio ring buffer so I don't
believe what I'm saying myself... It might be worth digging further as
well.

> > Perhaps backing filesystem dependant? qemu version? virtfs access options?
>
> I tried with different hardware and different file systems (ext4, btrfs), same
> misbehaviours.
>
> QEMU is latest git version. I also tried several different QEMU versions, same
> thing.
>
> QEMU command line used:
>
> ~/git/qemu/build/qemu-system-x86_64 \
> -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 16384 \
> -smp 8,sockets=8,cores=1,threads=1 -rtc base=utc -boot strict=on \
> -kernel ~/vm/bullseye/boot/vmlinuz \
> -initrd ~/vm/bullseye/boot/initrd.img \
> -append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=4186112,cache=loose console=ttyS0' \
> -fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=$HOME/vm/bullseye/ \
> -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fsRoot \
> -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -nographic
>
> Important for reproducing this issue:
>
> * cache=loose
> * -smp N (with N>1)
> * Guest booted with Linux kernel containing commit eb497943fa21
> (uname >= 5.16)
>
> I'm pretty sure that you can reproduce this issue with the QEMU 9p rootfs
> setup HOWTO linked before.

Yes, I'm not sure why I can't reproduce... All my computers are pretty
slow but the conditions should be met.
I'll try again with a command line closer to what you just gave here.


> > It's all extremely slow though... like the final checkout counting files
> > at less than 10/s
>
> It is VERY slow. And the weird thing is that cache=loose got much slower than
> cache=mmap. My worst case expactation would be cache=loose at least not
> performing worse than cache=mmap.

Yes, some profiling is also in order, it didn't use to be that slow so
it must not be reusing previously open fids as it should have or
something..

--
Dominique