Re: [PATCH 3/3] exec: Allow do_coredump to wait for user spacepipe readers to complete (v4)
From: Neil Horman
Date: Wed Jul 01 2009 - 06:31:31 EST
On Wed, Jul 01, 2009 at 07:52:57AM +0200, Oleg Nesterov wrote:
> On 06/30, Neil Horman wrote:
> >
> > void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> > {
> > struct core_state core_state;
> > char corename[CORENAME_MAX_SIZE + 1];
> > struct mm_struct *mm = current->mm;
> > struct linux_binfmt * binfmt;
> > - struct inode * inode;
> > - struct file * file;
> > + struct inode * inode = NULL;
> > + struct file * file = NULL;
>
> why this change?
>
Its part of a cosmetic change, see below.
> > @@ -1824,6 +1860,17 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> > corename);
> > goto fail_dropcount;
> > }
> > +
> > + /*
> > + * This lets us wait on a pipe after we close the writing
> > + * end. The extra reader count prevents the pipe_inode_info
> > + * from getting freed.
>
> but it can't be freed until we close file?
>
Damn, leftover comment from a previous version, needs to be removed.
> > This extra count is reclaimed in
> > + * wait_for_dump_helpers
> > + */
> > + pipe = file->f_path.dentry->d_inode->i_pipe;
> > + pipe_lock(pipe);
> > + pipe->readers++;
> > + pipe_unlock(pipe);
>
> why should we inc ->readers in advance?
>
Read the comment immediately above it and look at the filp_close path. We inc
->readers in advance so as to prevent pipe_inode_info getting freed between the
time we write out the core file and the time we wait on the pipe. If the
userspace helper exits in between those points we inode->i_pipe will be null by
the time we get to wait_for_dump_helpers. And a simple null check isn't
sufficient in wait_for_dump_helpers, since that still creates a window between
the check and the alternative increment of readers inside the loop, leading to a
use after free/corruption case.
> > + wait_for_dump_helpers(file);
>
> why do we call it unconditionally and then check ISFIFO? We only need to wait
> when ispipe = T, and in that case we know that this file is pipe.
>
Cosmetic, I can call it unconditionally here and then check if its a fifo in the
function, so that in do_coredump I don't have to do the following:
if (is_pipe)
wait_for_dump_helpers(file);
out_unlock:
filp_close(...)
if (is_pipe)
atomic_dec(&core_dump_count);
This is exactly the sort of crap your cleanups to do_coredump attemtped to
remove. I thought it best not to undo that work :)
I also do a NULL check in wait_for_dump_helpers, so that if the helper fails to
start properly, its a fall through case.
> IOW, could you explain why the (much simpler) patch I sent doesn't work ?
>
In short, because the much simpler patch that you sent is broken. I in fact
tried it as is, and ran across the exact race that I described above, in which
the user space helepr exited before we waited on it, resulting in an oops when
we tried to manipulate the i_pipe pointer, which had become NULL;
>
> Hmm. And in fact that pipe->readers++ above doesn't look right. What if
> the core_patter task exits? Since we incremented ->readers we can't notice
> the fact there are no readers, and f_op->write() will hang forever.
>
But if we don't we can loose the inode->i_pipe pointer. I suppose what we need
to do is increment writers immediately, then decrement writers and increment
readers after the return from ->core_dump
Neil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/