Re: mips qemu test failures in -next due to "kthread: Fix use-after-free if kthread fork fails"

From: Guenter Roeck
Date: Sun May 28 2017 - 11:00:58 EST


On 05/28/2017 07:49 AM, Vegard Nossum wrote:
On 05/28/17 13:45, Vegard Nossum wrote:
On 05/27/17 19:56, Guenter Roeck wrote:
Hi,

my qemu testis of mips images are failing in -next. Symptom is a hang during
boot; see http://kerneltests.org/builders/qemu-mips-next for some examples.

I bisected the problem in next-20170526. It points to commit 4d6501dce079c
("kthread: Fix use-after-free if kthread fork fails"). Reverting that patch
fixes the problem.

Bisect log is attached.

Hi,

Thanks for the report and sorry for the breakage :-/

I can't immediately spot what's going wrong, but I am able to reproduce
it on mips so I will try to debug.

Are you sure it's this commit, though? I checked out linus/master and
I get a boot hang even after reverting it.

My mistake; I ran into a different bug which made me think it was
hanging when it wasn't.

However, I think I found the problem; does this patch fix it for you too?


I'll give it a try.

I tried my qemu emulation on mainline after reverting your patch; it passed.
What kernel configuration and qemu command line did you use in your test ?

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 918d4c73e951..5351e1f3950d 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -120,7 +120,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
struct thread_info *ti = task_thread_info(p);
struct pt_regs *childregs, *regs = current_pt_regs();
unsigned long childksp;
- p->set_child_tid = p->clear_child_tid = NULL;

childksp = (unsigned long)task_stack_page(p) + THREAD_SIZE - 32;

The problem is that when we moved the p->{set,clear}_child_tid
assignments inside copy_process(), the above assignments would clear
them out. The assignments only exist on mips and openrisc (which would
need the same patch), which explains why I didn't see it in my x86

Interestingly, my openrisc test passed. Of course, that is just a boot test,
so it may not hit the problem.

Thanks,
Guenter

testing. I think the patch above should be safe given that we're now
always setting these fields in copy_process() at an appropriate moment.

Looks like those assignments came from commit 3c37026d43c47 ("NPTL,
round one."); Ralf?

Oleg?


Vegard