Re: [PATCH] tmpfs: use ida to get inode number

From: Matthew Wilcox
Date: Fri Nov 22 2019 - 17:13:41 EST


On Fri, Nov 22, 2019 at 09:23:30AM +0800, zhengbin (A) wrote:
> On 2019/11/22 3:53, Hugh Dickins wrote:
> > On Thu, 21 Nov 2019, zhengbin (A) wrote:
> >> On 2019/11/21 12:52, Hugh Dickins wrote:
> >>> Just a rushed FYI without looking at your patch or comments.
> >>>
> >>> Internally (in Google) we do rely on good tmpfs inode numbers more
> >>> than on those of other get_next_ino() filesystems, and carry a patch
> >>> to mm/shmem.c for it to use 64-bit inode numbers (and separate inode
> >>> number space for each superblock) - essentially,
> >>>
> >>> ino = sbinfo->next_ino++;
> >>> /* Avoid 0 in the low 32 bits: might appear deleted */
> >>> if (unlikely((unsigned int)ino == 0))
> >>> ino = sbinfo->next_ino++;
> >>>
> >>> Which I think would be faster, and need less memory, than IDA.
> >>> But whether that is of general interest, or of interest to you,
> >>> depends upon how prevalent 32-bit executables built without
> >>> __FILE_OFFSET_BITS=64 still are these days.
> >> So how google think about this? inode number > 32-bit, but 32-bit executables
> >> cat not handle this?
> > Google is free to limit what executables are run on its machines,
> > and how they are built, so little problem here.
> >
> > A general-purpose 32-bit Linux distribution does not have that freedom,
> > does not want to limit what the user runs. But I thought that by now
> > they (and all serious users of 32-bit systems) were building their own
> > executables with _FILE_OFFSET_BITS=64 (I was too generous with the
> > underscores yesterday); and I thought that defined __USE_FILE_OFFSET64,
> > and that typedef'd ino_t to be __ino64_t. And the 32-bit kernel would
> > have __ARCH_WANT_STAT64, which delivers st_ino as unsigned long long.
> >
> > So I thought that a modern, professional 32-bit executable would be
> > dealing in 64-bit inode numbers anyway. But I am not a system builder,
> > so perhaps I'm being naive. And of course some users may have to support
> > some old userspace, or apps that assign inode numbers to "int" or "long"
> > or whatever. I have no insight into the extent of that problem.
>
> So how to solve this problem?
>
> 1. tmpfs use ida or other data structure
>
> 2. tmpfs use 64-bit, each superblock a inode number space
>
> 3. do not do anything, If somebody hits this bug, let them solve for themselves
>
> 4. (last_ino change to 64-bit)get_next_ino -->other filesystems will be ok, but it was rejected before

5. Extend the sbitmap API to allow for growing the bitmap. I had a
look at doing that, and it looks hard. There are a lot of things which
are set up at initialisation and changing them mid-use seems tricky.
Ccing Jens in case he has an opinion.

6. Creating a percpu IDA. This doesn't seem too hard. We need a percpu
pointer to an IDA leaf (128 bytes), and a percpu integer which is the
current base for this CPU. At allocation time, find and set the first
free bit in the leaf, and add on the current base.

If the percpu leaf is full, set the XA_MARK_1 bit on the entry in
the XArray. Then look for any leaves which have both the XA_MARK_0
and XA_MARK_1 bits set; if there is one, claim it by clearing the
XA_MARK_1 bit. If not, kzalloc a new one and find a free spot for it
in the underlying XArray.

Freeing an ID is simply ida_free(). That will involve changing the
users of get_next_ino() to call put_ino(), or something.

This should generally result in similar contention between threads as
the current scheme -- accessing a shared resource every 1024 allocations.
Maybe more often as we try to avoid leaving gaps in the data structure,
or maybe less as we reuse IDs.

(I've tried to explain what I want here, but appreciate it may be
inscrutable. I can try to explain more, or maybe I should just write
the code myself)