Re: [PATCH 3/3] introduce task_rcu_dereference()

From: Oleg Nesterov
Date: Wed May 18 2016 - 14:23:28 EST


On 05/18, Peter Zijlstra wrote:
>
> So I keep coming back to this, and every time I look at it my brain
> explodes.

Heh ;) I forgot about this completely.

> On Mon, Oct 27, 2014 at 08:54:46PM +0100, Oleg Nesterov wrote:
> > +struct task_struct *task_rcu_dereference(struct task_struct **ptask)
> > +{
> > + struct task_struct *task;
> > + struct sighand_struct *sighand;
>
> I think that needs: ' = NULL', because if
>
> > +
> > + /*
> > + * We need to verify that release_task() was not called and thus
> > + * delayed_put_task_struct() can't run and drop the last reference
> > + * before rcu_read_unlock(). We check task->sighand != NULL, but
> > + * we can read the already freed and reused memory.
> > + */
> > + retry:
> > + task = rcu_dereference(*ptask);
> > + if (!task)
> > + return NULL;
> > +
> > + probe_slab_address(&task->sighand, sighand);
>
> this fails because of DEBUG_PAGE_ALLOC, we might not write to sighand at
> all, and
>
> > + /*
> > + * Pairs with atomic_dec_and_test(usage) in put_task_struct(task).
> > + * If this task was already freed we can not miss the preceding
> > + * update of this pointer.
> > + */
> > + smp_rmb();
> > + if (unlikely(task != ACCESS_ONCE(*ptask)))
> > + goto retry;
> > +
> > + /*
> > + * Either this is the same task and we can trust sighand != NULL, or
> > + * its memory was re-instantiated as another instance of task_struct.
> > + * In the latter case the new task can not go away until another rcu
> > + * gp pass, so the only problem is that sighand == NULL can be false
> > + * positive but we can pretend we got this NULL before it was freed.
> > + */
> > + if (sighand)
>
> this will be inspecting random values on the stack.

Yes, but thi_s doesn't differ from the case when we inspect the random value
if this task_struct was freed, then reallocated as another thing, then freed
and reallocated as task_struct again.

Please note the comment about the false positive above, and another one below:

> > + * We could even eliminate the false positive mentioned above:
> > + *
> > + * probe_slab_address(&task->sighand, sighand);
> > + * if (sighand)
> > + * goto retry;

this one.

IOW. We can never know if we have a garbage in "sighand" or the real value,
this task_struct can be freed/reallocated when we do probe_slab_address().

And this is fine. We re-check that "task == *ptask" after that. Now we have
two different cases:

1. This is actually the same task/task_struct. In this case
sighand != NULL tells us it is still alive.

2. This is another task which got the same memory for task_struct.
We can't know this of course, and we can not trust sighand != NULL.

In this case we actually return a random value, but this is correct.

If we return NULL - we can pretend that we actually noticed that
*ptask was updated when the previous task has exited. Or pretend
that probe_slab_address(&sighand) reads NULL.

If we return the new task (because sighand is not NULL for any
reason) - this is fine too. This (new) task can't go away before
another gp pass.

And please note again the "We could even eliminate the false positive"
comment above (hmm, it should probably say false negative). We could
re-read task->sighand once again to avoid the falsely NULL.

But this case is very unlikely so I think we do not really care.

And perhaps this idea can find more users...

> This thing does more than rcu_dereference; because it also guarantees
> that task->usage > 0 for the entire RCU section you do this under.
> Because delayed_put_task_struct() will be delayed until at least the
> matching rcu_read_unlock().

Yes, yes, exactly.

> Should we instead create a primitive like try_get_task_struct() which
> returns true if it got a reference? Because that is typically what
> people end up doing..

Well, I won't really argue... but personally I'd prefer to leave it to
the caller. Note that task_rcu_dereference() doesn't even do rcu_read_lock().
It really acts as "rcu dereference".

I agree, try_get_task_struct(ptask) makes sense too, but it should be
implemented as a trivial wrapper on top of task_rcu_dereference().

Oleg.