Re: [syzbot] WARNING: refcount bug in memfd_secret

From: Matthew Wilcox
Date: Sun Oct 24 2021 - 16:16:29 EST


On Sun, Oct 24, 2021 at 09:54:22AM -1000, Linus Torvalds wrote:
> On Sat, Oct 23, 2021 at 9:35 AM syzbot
> <syzbot+75639e6a0331cd61d3e2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 9c0c4d24ac00 Merge tag 'block-5.15-2021-10-22' of git://gi..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=115a0328b00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=59f3ef2b4077575
> > dashboard link: https://syzkaller.appspot.com/bug?extid=75639e6a0331cd61d3e2
> > compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13a035c2b00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14ae869f300000
> >
> > The issue was bisected to:
> >
> > commit 110860541f443f950c1274f217a1a3e298670a33
>
> I think that commit is actually just buggy.
>
> "secretmem_users" is not actually a reference count. There's no "magic
> happens when it goes down to zero".
>
> It's purely a count of the number of existing users, and incrementing
> it from zero is not a probolem at all - it is in fact expected.
>
> Sure, zero means "we can hibernate", so zero and overflow are somewhat
> special, but not special enough to cause these kinds of issues.
>
> I have reverted this commit in my tree, because honestly, the whole
> "try to overflow exactly, and hibernate" threat model just isn't worth
> this all.
>
> If people really care, I can suggest
>
> - use "atomic_long_t" instead. Let's face it, 32-bit isn't
> interesting any more, and 64-bit doesn't overflow.
>
> - make up some new "atomic_inc_nooverflow()" thing or whatever.
>
> but for now this is just reverted.

There was a separate thread on an earlier version of this report.
https://lore.kernel.org/linux-mm/YXU7%2FiRjf9v77gon@xxxxxxxxxxxxxxxxxxxx/
I agree with you and suggested that if anybody really cares (I mean,
you need a multi-TB machine to produce this problem) that we simply do
what we did with the page refcount:

+++ b/mm/secretmem.c
@@ -203,6 +203,8 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)

if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC))
return -EINVAL;
+ if (atomic_read(&secretmem_users) < 0)
+ return -ENFILE;

fd = get_unused_fd_flags(flags & O_CLOEXEC);
if (fd < 0)

Mike didn't particularly like that answer though.