Re: [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex

From: Qian Cai
Date: Wed Mar 11 2020 - 09:18:50 EST


On Sun, 2020-03-08 at 16:38 -0500, Eric W. Biederman wrote:
> The cred_guard_mutex is problematic. The cred_guard_mutex is held
> over the userspace accesses as the arguments from userspace are read.
> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other
> threads are killed. The cred_guard_mutex is held over
> "put_user(0, tsk->clear_child_tid)" in exit_mm().
>
> Any of those can result in deadlock, as the cred_guard_mutex is held
> over a possible indefinite userspace waits for userspace.
>
> Add exec_update_mutex that is only held over exec updating process
> with the new contents of exec, so that code that needs not to be
> confused by exec changing the mm and the cred in ways that can not
> happen during ordinary execution of a process.
>
> The plan is to switch the users of cred_guard_mutex to
> exec_udpate_mutex one by one. This lets us move forward while still
> being careful and not introducing any regressions.
>
> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@xxxxxxxxxxxxxx/
> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@xxxxxxxxxx/
> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@xxxxxxxxxx/
> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@xxxxxxxxxx/
> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.")
> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

This patch will trigger a warning during boot,

[ÂÂÂ19.707214][ÂÂÂÂT1] pci 0035:01:00.0: enabling device (0545 -> 0547)
[ÂÂÂ19.707287][ÂÂÂÂT1] EEH: Capable adapter found: recovery enabled.
[ÂÂÂ19.732541][ÂÂÂÂT1] cpuidle-powernv: Default stop: psscr =
0x0000000000000330,mask=0x00000000003003ff
[ÂÂÂ19.732567][ÂÂÂÂT1] cpuidle-powernv: Deepest stop: psscr =
0x0000000000300375,mask=0x00000000003003ff
[ÂÂÂ19.732598][ÂÂÂÂT1] cpuidle-powernv: First stop level that may lose SPRs =
0x4
[ÂÂÂ19.732617][ÂÂÂÂT1] cpuidle-powernv: First stop level that may lose timebase
= 0x10
[ÂÂÂ19.769784][ÂÂÂÂT1] HugeTLB registered 2.00 MiB page size, pre-allocated 0
pages
[ÂÂÂ19.769810][ÂÂÂÂT1] HugeTLB registered 1.00 GiB page size, pre-allocated 0
pages
[ÂÂÂ19.789344][ÂÂT718]Â
[ÂÂÂ19.789367][ÂÂT718] =====================================
[ÂÂÂ19.789379][ÂÂT718] WARNING: bad unlock balance detected!
[ÂÂÂ19.789393][ÂÂT718] 5.6.0-rc5-next-20200311+ #4 Not tainted
[ÂÂÂ19.789414][ÂÂT718] -------------------------------------
[ÂÂÂ19.789426][ÂÂT718] kworker/u257:0/718 is trying to release lock (&sig-
>exec_update_mutex) at:
[ÂÂÂ19.789459][ÂÂT718] [<c0000000004c6770>] free_bprm+0xe0/0xf0
[ÂÂÂ19.789481][ÂÂT718] but there are no more locks to release!
[ÂÂÂ19.789502][ÂÂT718]Â
[ÂÂÂ19.789502][ÂÂT718] other info that might help us debug this:
[ÂÂÂ19.789537][ÂÂT718] 1 lock held by kworker/u257:0/718:
[ÂÂÂ19.789558][ÂÂT718]ÂÂ#0: c000001fa8842808 (&sig->cred_guard_mutex){+.+.}, at:
__do_execve_file.isra.33+0x1b0/0xda0
[ÂÂÂ19.789611][ÂÂT718]Â
[ÂÂÂ19.789611][ÂÂT718] stack backtrace:
[ÂÂÂ19.789645][ÂÂT718] CPU: 8 PID: 718 Comm: kworker/u257:0 Not tainted 5.6.0-
rc5-next-20200311+ #4
[ÂÂÂ19.789681][ÂÂT718] Call Trace:
[ÂÂÂ19.789703][ÂÂT718] [c000000dad8cfa70] [c000000000979b40]
dump_stack+0xf4/0x164 (unreliable)
[ÂÂÂ19.789742][ÂÂT718] [c000000dad8cfac0] [c0000000001c1d78]
print_unlock_imbalance_bug+0x118/0x140
[ÂÂÂ19.789780][ÂÂT718] [c000000dad8cfb40] [c0000000001ceaa0]
lock_release+0x270/0x520
[ÂÂÂ19.789817][ÂÂT718] [c000000dad8cfbf0] [c0000000009a2898]
__mutex_unlock_slowpath+0x68/0x400
[ÂÂÂ19.789854][ÂÂT718] [c000000dad8cfcc0] [c0000000004c6770] free_bprm+0xe0/0xf0
[ÂÂÂ19.789900][ÂÂT718] [c000000dad8cfcf0] [c0000000004c845c]
__do_execve_file.isra.33+0x44c/0xda0
__do_execve_file at fs/exec.c:1904
[ÂÂÂ19.789938][ÂÂT718] [c000000dad8cfde0] [c0000000001391d8]
call_usermodehelper_exec_async+0x218/0x250
[ÂÂÂ19.789977][ÂÂT718] [c000000dad8cfe20] [c00000000000b748]
ret_from_kernel_thread+0x5c/0x74

> ---
> fs/exec.c | 9 +++++++++
> include/linux/sched/signal.h | 9 ++++++++-
> init/init_task.c | 1 +
> kernel/fork.c | 1 +
> 4 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index d820a7272a76..ffeebb1f167b 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm)
> {
> struct task_struct *tsk;
> struct mm_struct *old_mm, *active_mm;
> + int ret;
>
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm)
> return -EINTR;
> }
> }
> +
> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
> + if (ret)
> + return ret;
> +
> task_lock(tsk);
> active_mm = tsk->active_mm;
> membarrier_exec_mmap(mm);
> @@ -1438,6 +1444,8 @@ static void free_bprm(struct linux_binprm *bprm)
> {
> free_arg_pages(bprm);
> if (bprm->cred) {
> + if (!bprm->mm)
> + mutex_unlock(&current->signal->exec_update_mutex);
> mutex_unlock(&current->signal->cred_guard_mutex);
> abort_creds(bprm->cred);
> }
> @@ -1487,6 +1495,7 @@ void install_exec_creds(struct linux_binprm *bprm)
> * credentials; any time after this it may be unlocked.
> */
> security_bprm_committed_creds(bprm);
> + mutex_unlock(&current->signal->exec_update_mutex);
> mutex_unlock(&current->signal->cred_guard_mutex);
> }
> EXPORT_SYMBOL(install_exec_creds);
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 88050259c466..a29df79540ce 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -224,7 +224,14 @@ struct signal_struct {
>
> struct mutex cred_guard_mutex; /* guard against foreign influences on
> * credential calculations
> - * (notably. ptrace) */
> + * (notably. ptrace)
> + * Deprecated do not use in new code.
> + * Use exec_update_mutex instead.
> + */
> + struct mutex exec_update_mutex; /* Held while task_struct is being
> + * updated during exec, and may have
> + * inconsistent permissions.
> + */
> } __randomize_layout;
>
> /*
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..bd403ed3e418 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -26,6 +26,7 @@ static struct signal_struct init_signals = {
> .multiprocess = HLIST_HEAD_INIT,
> .rlim = INIT_RLIMITS,
> .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
> + .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex),
> #ifdef CONFIG_POSIX_TIMERS
> .posix_timers = LIST_HEAD_INIT(init_signals.posix_timers),
> .cputimer = {
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 60a1295f4384..12896a6ecee6 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
> sig->oom_score_adj_min = current->signal->oom_score_adj_min;
>
> mutex_init(&sig->cred_guard_mutex);
> + mutex_init(&sig->exec_update_mutex);
>
> return 0;
> }