Re: tmpfs inode leakage when opening file with O_TMP_FILE

From: Hugh Dickins
Date: Mon Feb 18 2019 - 23:23:35 EST


On Fri, 15 Feb 2019, Hugh Dickins wrote:
> On Thu, 14 Feb 2019, Darrick J. Wong wrote:
> > > On Mon, 11 Feb 2019 15:18:11 +0100 Matej Kupljen <matej.kupljen@xxxxxxxxx> wrote:
> > > >
> > > > it seems that when opening file on file system that is mounted on
> > > > tmpfs with the O_TMPFILE flag and using linkat call after that, it
> > > > uses 2 inodes instead of 1.
...
> >
> > Heh, tmpfs and its weird behavior where each new link counts as a new
> > inode because "each new link needs a new dentry, pinning lowmem, and
> > tmpfs dentries cannot be pruned until they are unlinked."
>
> That's very much a peculiarity of tmpfs, so agreed: it's what I expect
> to be the cause, but I've not actually tracked it through and fixed yet.
...
>
> > I /think/ the proper fix is to change shmem_link to decrement ifree only
> > if the inode has nonzero nlink, e.g.
> >
> > /*
> > * No ordinary (disk based) filesystem counts links as inodes;
> > * but each new link needs a new dentry, pinning lowmem, and
> > * tmpfs dentries cannot be pruned until they are unlinked. If
> > * we're linking an O_TMPFILE file into the tmpfs we can skip
> > * this because there's still only one link to the inode.
> > */
> > if (inode->i_nlink > 0) {
> > ret = shmem_reserve_inode(inode->i_sb);
> > if (ret)
> > goto out;
> > }
> >
> > Says me who was crawling around poking at O_TMPFILE behavior all morning.
> > Not sure if that's right; what happens to the old dentry?

Not sure what you mean by "what happens to the old dentry?"
But certainly the accounting feels a bit like a shell game,
and my attempts to explain it have not satisfied even me.

The way I'm finding it helpful to think, is to imagine tmpfs'
count of inodes actually to be implemented as a count of dentries.
And the 1 for the last remaining goes away in the shmem_free_inode()
at the end of shmem_evict_inode(). Does that answer "what happens"?

Since applying the patch, I have verified (watching "dentry" and
"shmem_inode_cache" in /proc/slabinfo) that doing Matej's sequence
repeatedly does not leak any "df -i" nor dentries nor inodes.

>
> I'm relieved to see your "/think/" above and "Not sure" there :)
> Me too. It is so easy to get these counting things wrong, especially
> when distributed between the generic and the specific file system.
>
> I'm not going to attempt a pronouncement until I've had time to
> sink properly into it at the weekend, when I'll follow your guide
> and work it through - thanks a lot for getting this far, Darrick.

I have now sunk into it, and sure that I agree with your patch,
filled out below (I happen to have changed "inode->i_nlink > 0" to
"inode->i_nlink" just out of some personal preference at the time).
One can argue that it's not technically quite the right place, but
it is the place where we can detect the condition without getting
into unnecessary further complications, and does the job well enough.

May I change "Suggested-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>"
to your "Signed-off-by" before sending on to Andrew "From" you?

Thanks!
Hugh

[PATCH] tmpfs: fix link accounting when a tmpfile is linked in

tmpfs has a peculiarity of accounting hard links as if they were separate
inodes: so that when the number of inodes is limited, as it is by default,
a user cannot soak up an unlimited amount of unreclaimable dcache memory
just by repeatedly linking a file.

But when v3.11 added O_TMPFILE, and the ability to use linkat() on the fd,
we missed accommodating this new case in tmpfs: "df -i" shows that an
extra "inode" remains accounted after the file is unlinked and the fd
closed and the actual inode evicted. If a user repeatedly links tmpfiles
into a tmpfs, the limit will be hit (ENOSPC) even after they are deleted.

Just skip the extra reservation from shmem_link() in this case: there's
a sense in which this first link of a tmpfile is then cheaper than a
hard link of another file, but the accounting works out, and there's
still good limiting, so no need to do anything more complicated.

Fixes: f4e0c30c191 ("allow the temp files created by open() to be linked to")
Reported-by: Matej Kupljen <matej.kupljen@xxxxxxxxx>
Suggested-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
---

mm/shmem.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

--- 5.0-rc7/mm/shmem.c 2019-01-06 19:15:45.764805103 -0800
+++ linux/mm/shmem.c 2019-02-18 13:56:48.388032606 -0800
@@ -2854,10 +2854,14 @@ static int shmem_link(struct dentry *old
* No ordinary (disk based) filesystem counts links as inodes;
* but each new link needs a new dentry, pinning lowmem, and
* tmpfs dentries cannot be pruned until they are unlinked.
+ * But if an O_TMPFILE file is linked into the tmpfs, the
+ * first link must skip that, to get the accounting right.
*/
- ret = shmem_reserve_inode(inode->i_sb);
- if (ret)
- goto out;
+ if (inode->i_nlink) {
+ ret = shmem_reserve_inode(inode->i_sb);
+ if (ret)
+ goto out;
+ }

dir->i_size += BOGO_DIRENT_SIZE;
inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);