procfs: mnt namespace behaviour with block devices (resend)

From: Craig Small
Date: Mon May 09 2022 - 06:44:45 EST


(resending as plain text as the first got bounced)

Hi,
I'm the maintainer of the psmisc package that provides system tools
for things like fuser and killall. I am trying to establish if
something I have found with the proc filesystem is as intended
(knowing why would be nice) or if it's a strange corner-case bug.

Apologies to the non-procfs maintainers but these two lists are what
MAINTAINER said to go to. If you could CC me on replies that would be
great.

The proc file descriptor for a block device mounted in a different
namespace will show the device id of that different namespace and not
the device id of the process stat()ing the file.

The issue came up in fuser not finding certain processes that were
directly accessing a block device, see
https://gitlab.com/psmisc/psmisc/-/issues/39 Programs such as lsof are
caught by this too.

My question is: When I am in the bash mount namespace (4026531840 below)
then shouldn't all the device IDs be from that namespace? In other
words, the device id of the dereferenced symlink and what it points to
are the same (device id 5) and not symlink has 44 and /dev/dm-8 has 5.

I get that if I could look at the device IDs in qemu or use nsenter to
switch to its namespace, then the device should be 44 for the symlink
and device (which it is and seems correct to me).

How to replicate
=============
# uname -a
Linux elmo 5.16.0-5-amd64 #1 SMP PREEMPT Debian 5.16.14-1 (2022-03-15)
x86_64 GNU/Linux

The easiest way to replicate this is to make a qemu virtual machine and
have it mount a block device. I suspect there are other ways, but I
don't have many things that mount a device and switch namespaces. The
qemu process (here it is 136775) will have a different mount namespace.

# ps -o pid,mntns,comm $$ 136775
PID MNTNS COMMAND
136775 4026532762 qemu-system-x86
142359 4026531840 bash

File descriptor 23 is what qemu is using to mount the block device
# ls -l /proc/136775/fd/23
lrwx------ 1 libvirt-qemu libvirt-qemu 64 Apr 12 16:34
/proc/136775/fd/23 -> /dev/dm-8

However, the dereferenced symlink and where the symlink points to show
different data.

# stat -L /proc/136775/fd/23
File: /proc/136775/fd/23
Size: 0 Blocks: 0 IO Block: 4096 block special file
Device: 2ch/44d Inode: 9 Links: 1 Device type: fd,8
Access: (0660/brw-rw----) Uid: (64055/libvirt-qemu) Gid: (64055/libvirt-qemu)
Access: 2022-04-12 16:34:25.687147886 +1000
Modify: 2022-04-12 16:34:25.519151533 +1000
Change: 2022-04-12 16:34:25.595149882 +1000
Birth: -

# stat /dev/dm-8
File: /dev/dm-8
Size: 0 Blocks: 0 IO Block: 4096 block special file
Device: 5h/5d Inode: 348 Links: 1 Device type: fd,8
Access: (0660/brw-rw----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-04-12 16:15:12.684434884 +1000
Modify: 2022-04-12 16:15:12.684434884 +1000
Change: 2022-04-12 16:15:12.684434884 +1000
Birth: -

If we change to the qemu process' mount namespace then we do see that
/dev/dm-8 has the same device/inode as the symlink.

# nsenter -m -t 136775 stat /dev/dm-8
File: /dev/dm-8
Size: 0 Blocks: 0 IO Block: 4096 block special file
Device: 2ch/44d Inode: 9 Links: 1 Device type: fd,8
Access: (0660/brw-rw----) Uid: (64055/libvirt-qemu) Gid: (64055/libvirt-qemu)
Access: 2022-04-12 16:34:25.687147886 +1000
Modify: 2022-04-12 16:34:25.519151533 +1000
Change: 2022-04-12 16:34:25.595149882 +1000
Birth: -

Thanks for your time.

- Craig