Re: [PATCH 1/4] introduce complete_vfork_done()

From: Andrew Morton
Date: Thu Feb 16 2012 - 19:35:45 EST


On Thu, 16 Feb 2012 18:26:47 +0100
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> No functional changes.
>
> Move the clear-and-complete-vfork_done code into the new trivial
> helper, complete_vfork_done().
>
> ...
>
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1915,7 +1915,6 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
> {
> struct task_struct *tsk = current;
> struct mm_struct *mm = tsk->mm;
> - struct completion *vfork_done;
> int core_waiters = -EBUSY;
>
> init_completion(&core_state->startup);
> @@ -1934,11 +1933,8 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
> * Make sure nobody is waiting for us to release the VM,
> * otherwise we can deadlock when we wait on each other
> */
> - vfork_done = tsk->vfork_done;
> - if (vfork_done) {
> - tsk->vfork_done = NULL;
> - complete(vfork_done);
> - }
> + if (tsk->vfork_done)
> + complete_vfork_done(tsk);
>
> if (core_waiters)
> wait_for_completion(&core_state->startup);
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -667,6 +667,14 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
> return mm;
> }
>
> +void complete_vfork_done(struct task_struct *tsk)
> +{
> + struct completion *vfork_done = tsk->vfork_done;
> +
> + tsk->vfork_done = NULL;
> + complete(vfork_done);
> +}
> +
> /* Please note the differences between mmput and mm_release.
> * mmput is called whenever we stop holding onto a mm_struct,
> * error success whatever.
> @@ -682,8 +690,6 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
> */
> void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> {
> - struct completion *vfork_done = tsk->vfork_done;
> -
> /* Get rid of any futexes when releasing the mm */
> #ifdef CONFIG_FUTEX
> if (unlikely(tsk->robust_list)) {
> @@ -703,11 +709,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> /* Get rid of any cached register state */
> deactivate_mm(tsk, mm);
>
> - /* notify parent sleeping on vfork() */
> - if (vfork_done) {
> - tsk->vfork_done = NULL;
> - complete(vfork_done);
> - }
> + if (tsk->vfork_done)
> + complete_vfork_done(tsk);

This all looks somewhat smelly.

- Why do we zero tsk->vfork_done in this manner? It *looks* like
it's done to prevent the kernel from running complete() twice against
a single task in a race situation. If this is the case then it's
pretty lame, isn't it? We'd need external locking to firm that up
and I'm not seeing it.

- Moving the test for non-null tsk->vfork_done into
complete_vfork_done() would simplify things a bit?

- The complete_vfork_done() interface isn't wonderful. What prevents
tsk from getting freed? Presumably the caller must have pinned it in
some fashion? Or must hold some lock? Or it's always run against
`current', in which case it would be clearer to not pass the
task_struct arg at all?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/