Re: [syzbot] [hardening?] [mm?] BUG: bad usercopy in vfs_readlink

From: Mateusz Guzik
Date: Tue Feb 04 2025 - 16:25:50 EST


On Tue, Feb 4, 2025 at 9:48 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> On Tue, Feb 4, 2025 at 9:31 PM Theodore Ts'o <tytso@xxxxxxx> wrote:
> >
> > On Tue, Feb 04, 2025 at 05:49:48PM +0100, Mateusz Guzik wrote:
> > > I'm going to restate: the original behavior can be restored by
> > > replacing i_size usage with a strlen call. However, as is I have no
> > > basis to think that the disparity between the two is legitimate. If an
> > > ext4 person (Ted cc'ed) tells me the disparity can legally happen and
> > > is the expected way, I'm going to patch it up no problem.
> >
> > There are two kinds of symlinks in ext4. "Fast symlinks", where the
> > symlink target can fit in i_block[]. And normal, sometimes called
> > "slow" symlinks, where i_block[] contains a pointer to data block
> > (either i_blocks[0] for a traditional inode, or the root of an extent
> > tree that will fit in i_block[] if EXTENTS_FL is set). In the case
> > of a fast symlink, the maximum size of the file system is
> > sizeof(i_block[])-1. For a normal/slow symlink, the maximum size is
> > super->s_blocksize.
> >
> > We determine whether a symlink is "fast" or not by checking whether
> > i_size is less than sizeof(i_blocks[]). In other words. if i_size
> > less than 60 (and it's not an encrypted inode), it's a fast symlink
> > and we set i_link to point to the i_block array. From __ext4_iget()
> > in fs/ext4/inode.c:
> >
> > if (IS_ENCRYPTED(inode)) {
> > inode->i_op = &ext4_encrypted_symlink_inode_operations;
> > } else if (ext4_inode_is_fast_symlink(inode)) {
> > inode->i_link = (char *)ei->i_data;
> > inode->i_op = &ext4_fast_symlink_inode_operations;
> > nd_terminate_link(ei->i_data, inode->i_size,
> > sizeof(ei->i_data) - 1);
> > } else {
> > inode->i_op = &ext4_symlink_inode_operations;
> > }
> >
> >
> > The call to nd_terminate_link() guarantees that inode->i_link is
> > NUL-terminated, although the on-disk formatshould have already been
> > NUL-terminated when the symlink is set.
> >
>
> It is nul-terminated, but inode->i_size does not line up with it -- as
> in the inode size pulled from the disk image is different than what
> strlen would suggest.
>
> My question is if that's legitimate, I'm guessing not. If not, then
> ext4 should complain about it.
>
> On stock kernel this happens to work because strlen finds the "right" size.
>

So it occurred to me to check what fsck thinks about it.

I ran it twice in a row, it *removed* the problematic symlink.
e2fsck 1.47.0 (5-Feb-2023)
One or more block group descriptor checksums are invalid. Fix<y>? yes
Group descriptor 0 checksum is 0xd7c5, should be 0xa2fe. FIXED.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /file0/file1 (inode #14) is invalid.
Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
[ERROR] ../../../../lib/support/quotaio_tree.c:546:check_reference:
Illegal reference (4294967295 >= 6) in user quota file
Update quota info for quota type 0<y>? yes
[QUOTA WARNING] Usage inconsistent for ID 4294967295:actual (0, 0) !=
expected (458752, 8)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (458752, 8) !=
expected (0, 0)
[ERROR] ../../../../lib/support/quotaio_tree.c:546:check_reference:
Illegal reference (256 >= 6) in group quota file
Update quota info for quota type 1<y>? yes

syzkaller: ***** FILE SYSTEM WAS MODIFIED *****
syzkaller: 16/32 files (0.0% non-contiguous), 160/512 blocks

So I stand by my assessment that the symlink fetching code should
check for the problem of size vs strlen disparity.

> > Also note that there really is no point to caching inodes where
> > inode->i_link is not a NUL pointer, since in those cases we never call
> > file system's get_link() function; the VFS will just fetch it straight
> > from i_link. And in those cases, it's because it's a "fast symlink"
> > (many file systems have this concept; not just ext[234]) and the
> > symlink target was read in as part of the on-disk inode, and so there
> > is no separate disk block read read the symlink. And so if you are
> > attempting to cache symlinks where inode->i_link is non-NULL, you're
> > just wasting a small amount of memory, and wasting the CPU time to do
> > the strcpy.
> >
>
> My code does not allocate any extra memory or do any extra copies.
>
> The code prior to my change would grab i_link, strlen on it and pass
> that to copy_to_user every single time readlink is issued.
>
> My code stores the size in the inode (unioned with a field not used
> for symlinks, so no again no increase in memory usage) so that the
> strlen call can be avoided. Past that it's the same thing.
>
> --
> Mateusz Guzik <mjguzik gmail.com>



--
Mateusz Guzik <mjguzik gmail.com>