Re: [syzbot] [hardening?] [mm?] BUG: bad usercopy in vfs_readlink

From: Theodore Ts'o
Date: Tue Feb 04 2025 - 15:31:38 EST


On Tue, Feb 04, 2025 at 05:49:48PM +0100, Mateusz Guzik wrote:
> I'm going to restate: the original behavior can be restored by
> replacing i_size usage with a strlen call. However, as is I have no
> basis to think that the disparity between the two is legitimate. If an
> ext4 person (Ted cc'ed) tells me the disparity can legally happen and
> is the expected way, I'm going to patch it up no problem.

There are two kinds of symlinks in ext4. "Fast symlinks", where the
symlink target can fit in i_block[]. And normal, sometimes called
"slow" symlinks, where i_block[] contains a pointer to data block
(either i_blocks[0] for a traditional inode, or the root of an extent
tree that will fit in i_block[] if EXTENTS_FL is set). In the case
of a fast symlink, the maximum size of the file system is
sizeof(i_block[])-1. For a normal/slow symlink, the maximum size is
super->s_blocksize.

We determine whether a symlink is "fast" or not by checking whether
i_size is less than sizeof(i_blocks[]). In other words. if i_size
less than 60 (and it's not an encrypted inode), it's a fast symlink
and we set i_link to point to the i_block array. From __ext4_iget()
in fs/ext4/inode.c:

if (IS_ENCRYPTED(inode)) {
inode->i_op = &ext4_encrypted_symlink_inode_operations;
} else if (ext4_inode_is_fast_symlink(inode)) {
inode->i_link = (char *)ei->i_data;
inode->i_op = &ext4_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
} else {
inode->i_op = &ext4_symlink_inode_operations;
}


The call to nd_terminate_link() guarantees that inode->i_link is
NUL-terminated, although the on-disk formatshould have already been
NUL-terminated when the symlink is set.

Also note that there really is no point to caching inodes where
inode->i_link is not a NUL pointer, since in those cases we never call
file system's get_link() function; the VFS will just fetch it straight
from i_link. And in those cases, it's because it's a "fast symlink"
(many file systems have this concept; not just ext[234]) and the
symlink target was read in as part of the on-disk inode, and so there
is no separate disk block read read the symlink. And so if you are
attempting to cache symlinks where inode->i_link is non-NULL, you're
just wasting a small amount of memory, and wasting the CPU time to do
the strcpy.

It's also true that the vast majority of symlink targets are less than
60 bytes, which is why I was surprised that your symlink caching was
atually making a difference or many systems. I guess if we have lots
of unicode file names with characters like ❤ and ❤️ and symlinks to
these files, you might have a bunch of slow symlinks. But in the case
where you have symlinks like:

0 lrwxrwxrwx 1 root root 7 Dec 19 2020 /bin -> usr/bin/

Symlink caching won't do any good.

Cheers,

- Ted