Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation

From: Theodore Y. Ts'o
Date: Fri Dec 28 2018 - 21:15:09 EST


On Fri, Dec 28, 2018 at 11:18:18AM +0000, Peter Maydell wrote:
> In general inodes and offsets start from 0 and work up --
> so almost all of the time they don't actually overflow.
> The problem with ext4 directory hash "offsets" is that they
> overflow all the time and immediately, so instead of "works
> unless you have a weird edge case" like all the other filesystems,h
> it's "never works".

Actually, XFS uses the inode number to encode the location of the
inode (it doesn't have a fixed inode table, so it's effectively the
block number shifted left by 3 or 4 bits, with the low bits indicating
the slot in the 4k block). It has a hack to provide backwards
compatibility for 32-bit API's, but there is a similar, "oh, we're on
a non-paleolithic CPU, let's use the full 64-bits" sort of logic that
ext4 has.

> The problem is that there is no 32-bit API in some cases
> (unless I have misunderstood the kernel code) -- not all
> host architectures implement compat syscalls or allow them
> to be called from 64-bit processes or implement all the older
> syscall variants that had smaller offets. If there was a guaranteed
> "this syscall always exists and always gives me 32-bit offsets"
> we could use it.

Are there going to be cases where a process or a thread will sometimes
want the 64-bit interface, and sometimes want the 32-bit interface?
Or is it always going to be one or the other? I wonder if we could
simply add a new flag to the process personality(2) flags.

> Yes, that has been suggested, but it seemed a bit dubious
> to bake in knowledge of ext4's internal implementation details.
> Can we rely on this as an ABI promise that will always work
> for all versions of all file systems going forwards?

Yeah, that seems dubious because I'm pretty sure there are other file
systems that may have their own 32/64-bit quirks.

- Ted