Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

From: Theodore Ts'o
Date: Mon Mar 13 2023 - 22:27:34 EST


On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
>
> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

Modifying the block device while the file system is mounted is
something that we have to allow for now because tune2fs uses it to
modify the superblock. It has historically also been used (rarely) by
people who know what they are doing to do surgery on a mounted file
system. If we create a way for tune2fs to be able to update the
superblock via some kind of ioctl, we could disallow modifying the
block device while the file system is mounted. Of course, it would
require waiting at least 5-6 years since sometimes people will update
the kernel without updating userspace. We'd also need to check to
make sure there aren't boot loader installer (such as grub-install)
that depend on being able to modify the block device while the root
file system is mounted, at least in some rare cases.

The "how" to exclude mounted file systems is relatively easy. The
kernel already knows when the file system is mounted, and it is
already a supported feature that a userspace application that wants to
be careful can open a block device with O_EXCL, and if it is in use by
the kernel --- mounted by a file system, being used by dm-thin, et. al
-- the open(2) system call will fail. From the open(2) man page.

In general, the behavior of O_EXCL is undefined if it is used without
O_CREAT. There is one exception: on Linux 2.6 and later, O_EXCL can
be used without O_CREAT if pathname refers to a block device. If the
block device is in use by the system (e.g., mounted), open() fails
with the error EBUSY.

Something which the syzbot could to do today is to simply use O_EXCL
whenever trying to open a block device. This would avoid a class of
syzbot false positives, since normally it requires root privileges
and/or an experienced sysadmin to try to modify a block device while
it is mounted and/or in use by LVM.

- Ted

P.S. Trivia note: Aproximately month after I started work at VA Linux
Systems, a sysadmin intern which was given the root password to
sourceforge.net, while trying to fix a disk-to-disk backup, ran
mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
RAID 0 setup on which open source code critical to the community
(including, for example, OpenGL) was mounted and serving. The intern
got about 50% the way through zeroing the inode table on /dev/hdXX
before the file system noticed and threw an error, at which point
wiser heads stopped what the intern was doing and tried to clean up
the mess. Of course, there were no backups, since that was what the
intern was trying to fix!

There are a couple of things that we could learn from this incident.
One was that giving the root password to an untrained intern not
familiar with the setup on the serving system was... an unfortunate
choice. Another was that adding the above-mentioned O_EXCL feature
and teaching mkfs to use it was an obvious post-mortem action item to
prevent this kind of problem in the future...