Re: [PATCH] Revert "exec: make de_thread() freezable (was: Re: Linux 4.20-rc4)
From: Oleg Nesterov
Date: Mon Dec 03 2018 - 06:56:07 EST
On 12/03, Ingo Molnar wrote:
>
> So there's a new regression in v4.20-rc4, my desktop produces this
> lockdep splat:
>
> [ 1772.588771] WARNING: pkexec/4633 still has locks held!
> [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted
> [ 1772.588775] ------------------------------------
> [ 1772.588776] 1 lock held by pkexec/4633:
> [ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70
> [ 1772.588786] stack backtrace:
> [ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1
> [ 1772.588792] Call Trace:
> [ 1772.588800] dump_stack+0x85/0xcb
> [ 1772.588803] flush_old_exec+0x116/0x890
> [ 1772.588807] ? load_elf_phdrs+0x72/0xb0
> [ 1772.588809] load_elf_binary+0x291/0x1620
> [ 1772.588815] ? sched_clock+0x5/0x10
> [ 1772.588817] ? search_binary_handler+0x6d/0x240
> [ 1772.588820] search_binary_handler+0x80/0x240
> [ 1772.588823] load_script+0x201/0x220
> [ 1772.588825] search_binary_handler+0x80/0x240
> [ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60
> [ 1772.588832] ? strncpy_from_user+0x40/0x180
> [ 1772.588835] __x64_sys_execve+0x34/0x40
> [ 1772.588838] do_syscall_64+0x60/0x1c0
My fault.
Michal didn't like this patch, but I managed to confuse him ;)
I even mentioned that freezable_schedule() is obviously not right with
->cred_guard_mutex held, but I somehow I thought that this patch "doesn't
make things worse".
> I reviewed the ->cred_guard_mutex code, and the mutex is held across all
> of exec() - and we always did this.
Yes, and this was always wrong. For example, this test-case hangs:
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <pthread.h>
void *thread(void *arg)
{
ptrace(PTRACE_TRACEME, 0,0,0);
return NULL;
}
int main(void)
{
int pid = fork();
if (!pid) {
pthread_t pt;
pthread_create(&pt, NULL, thread, NULL);
pthread_join(pt, NULL);
execlp("echo", "echo", "passed", NULL);
}
sleep(1);
// or anything else which needs ->cred_guard_mutex,
// say open(/proc/$pid/mem)
ptrace(PTRACE_ATTACH, pid, 0,0);
kill(pid, SIGCONT);
return 0;
}
we really need to narrow the (huge) scope of ->cred_guard_mutex in exec paths.
my attempt to fix this was nacked, and nobody suggested a better solution so far.
> This reverts commit c22397888f1eed98cd59f0a88f2a5f6925f80e15.
Thanks.
> ---
> fs/exec.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index acc3a5536384..fc281b738a98 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -62,7 +62,6 @@
> #include <linux/oom.h>
> #include <linux/compat.h>
> #include <linux/vmalloc.h>
> -#include <linux/freezer.h>
>
> #include <linux/uaccess.h>
> #include <asm/mmu_context.h>
> @@ -1084,7 +1083,7 @@ static int de_thread(struct task_struct *tsk)
> while (sig->notify_count) {
> __set_current_state(TASK_KILLABLE);
> spin_unlock_irq(lock);
> - freezable_schedule();
> + schedule();
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> spin_lock_irq(lock);
> @@ -1112,7 +1111,7 @@ static int de_thread(struct task_struct *tsk)
> __set_current_state(TASK_KILLABLE);
> write_unlock_irq(&tasklist_lock);
> cgroup_threadgroup_change_end(tsk);
> - freezable_schedule();
> + schedule();
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> }