Re: [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock
From: Heiko Carstens
Date: Tue Mar 22 2016 - 07:41:33 EST
On Tue, Mar 22, 2016 at 11:21:53AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 21, 2016 at 11:16:22AM -0700, Davidlohr Bueso wrote:
>
> > +/*
> > + * Helpers for modifying the state of either the current task, or a foreign
> > + * task. Each of these calls come in both full barrier and weak flavors:
> > + *
> > + * Weak
> > + * set_task_state() __set_task_state()
> > + * set_current_state() __set_current_state()
> > + *
> > + * Where set_current_state() and set_task_state() includes a full smp barrier
> > + * -after- the write of ->state is correctly serialized with the later test
> > + * of whether to actually sleep:
> > + *
> > + * for (;;) {
> > + * set_current_state(TASK_UNINTERRUPTIBLE);
> > + * if (event_indicated)
> > + * break;
> > + * schedule();
> > + * }
> > + *
> > + * This is commonly necessary for processes sleeping and waking through flag
> > + * based events. If the caller does not need such serialization, then use
> > + * weaker counterparts, which simply writes the state.
> > + *
> > + * Refer to Documentation/memory-barriers.txt
> > + */
>
> I would prefer to pretend set_task_state() does not exist, using it on
> anything other than task==current is very very tricky.
>
> With the below patch; we're only left with:
>
> arch/s390/mm/fault.c: __set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> arch/s390/mm/fault.c: __set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> drivers/md/bcache/btree.c: set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
> kernel/exit.c: set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> kernel/exit.c: __set_task_state(tsk, TASK_RUNNING);
>
> exit most probably also has tsk==current, but I didn't check.
>
> bacache seems to rely on the fact that the task is not running after
> kthread_create() to change the state. But I've no idea why; the only
> think I can come up with is because load accounting, a new thread blocks
> in UNINTERRUPTIBLE which adds to load. But by setting it to
> INTERRUPTIBLE before waking up it can actually mess that up. This really
> should be fixed.
>
> And s390 does something entirely vile, no idea what.
For the two s390 usages tsk equals current. So it could be easily replaced
with set_current_state().