Re: [PATCH v2] binfmt_misc: Fix possible deadlock in bm_register_write

From: Andrew Morton
Date: Wed Mar 03 2021 - 06:06:41 EST


On Sun, 28 Feb 2021 14:44:14 -0800 Lior Ribak <liorribak@xxxxxxxxx> wrote:

> There is a deadlock in bm_register_write:
> First, in the beggining of the function, a lock is taken on the
> binfmt_misc root inode with inode_lock(d_inode(root))
> Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will
> call open_exec on the user-provided interpreter.
> open_exec will call a path lookup, and if the path lookup process
> includes the root of binfmt_misc, it will try to take a shared lock
> on its inode again, but it is already locked, and the code will
> get stuck in a deadlock
>
> To reproduce the bug:
> $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register
>
> backtrace of where the lock occurs (#5):
> 0 schedule () at ./arch/x86/include/asm/current.h:15
> 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992
> 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213
> 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222
> 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355
> 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783
> 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177
> 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366
> 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396
> 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913
> 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948
> 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682
> 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF\n", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603
> 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF\n", count=19) at fs/read_write.c:658
> 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46
> 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
>
> To solve the issue, the open_exec call is moved to before the write
> lock is taken by bm_register_write
>

Looks good to me.

I assume this is an ancient bug and that a backport to -stable trees
(with a cc:stable) is warranted?