Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit
From: Darrick J. Wong
Date: Sat Dec 21 2019 - 13:05:52 EST
On Sat, Dec 21, 2019 at 10:43:05AM +0200, Amir Goldstein wrote:
> On Fri, Dec 20, 2019 at 11:33 PM Darrick J. Wong
> <darrick.wong@xxxxxxxxxx> wrote:
> >
> > On Fri, Dec 20, 2019 at 02:49:36AM +0000, Chris Down wrote:
> > > In Facebook production we are seeing heavy inode number wraparounds on
> > > tmpfs. On affected tiers, in excess of 10% of hosts show multiple files
> > > with different content and the same inode number, with some servers even
> > > having as many as 150 duplicated inode numbers with differing file
> > > content.
> > >
> > > This causes actual, tangible problems in production. For example, we
> > > have complaints from those working on remote caches that their
> > > application is reporting cache corruptions because it uses (device,
> > > inodenum) to establish the identity of a particular cache object, but
> >
> > ...but you cannot delete the (dev, inum) tuple from the cache index when
> > you remove a cache object??
> >
> > > because it's not unique any more, the application refuses to continue
> > > and reports cache corruption. Even worse, sometimes applications may not
> > > even detect the corruption but may continue anyway, causing phantom and
> > > hard to debug behaviour.
> > >
> > > In general, userspace applications expect that (device, inodenum) should
> > > be enough to be uniquely point to one inode, which seems fair enough.
> >
> > Except that it's not. (dev, inum, generation) uniquely points to an
> > instance of an inode from creation to the last unlink.
> >
>
> Yes, but also:
> There should not exist two live inodes on the system with the same (dev, inum)
> The problem is that ino 1 may still be alive when wraparound happens
> and then two different inodes with ino 1 exist on same dev.
*OH* that's different then. Most sane filesystems <cough>btrfs<cough>
should never have the same inode numbers for different files. Sorry for
the noise, I misunderstood what the issue was. :)
> Take the 'diff' utility for example, it will report that those files
> are identical
> if they have the same dev,ino,size,mtime. I suspect that 'mv' will not
> let you move one over the other, assuming they are hardlinks.
> generation is not even exposed to legacy application using stat(2).
Yeah, I was surprised to see it's not even in statx. :/
--D
> Thanks,
> Amir.