Re: [BISECTED] rcu_sched self-detected stall since 3.17

From: Oleg Nesterov
Date: Tue Dec 15 2015 - 11:56:34 EST


Sorry again for the huge delay.

And all I can say is that I am all confused.

On 12/01, Peter Zijlstra wrote:
>
> On Fri, Nov 20, 2015 at 03:35:38PM +0000, Vladimir Murzin wrote:
> > commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
> > Author: NeilBrown <neilb@xxxxxxx>
> > Date: Mon Jul 7 15:16:04 2014 +1000

That patch still looks correct to me.

> > and if I apply following diff I don't see stalls anymore.
> >
> > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> > index a104879..2d68cdb 100644
> > --- a/kernel/sched/wait.c
> > +++ b/kernel/sched/wait.c
> > @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);
> >
> > __sched int bit_wait_io(void *word)
> > {
> > + io_schedule();
> > +
> > if (signal_pending_state(current->state, current))
> > return 1;
> > - io_schedule();
> > return 0;
> > }
> > EXPORT_SYMBOL(bit_wait_io);

I can't understand why this change helps. But note that it actually removes
the signal_pending_state() check from bit_wait_io(), current->state is always
TASK_RUNNING after return from schedule(), signal_pending_state() will always
return zero.

This means that after this change wait_on_page_bit_killable() will spin in a
busy-wait loop if the caller is killed.

> The reason this is broken is that schedule() will no-op when there is a
> pending signal, while raising a signal will also issue a wakeup.

But why this is wrong? We should notice signal_pending_state() on the next
iteration.

> Thus the right thing to do is check for the signal state after,

I think this check should work on both sides. The only difference is that
you obviously can't use current->state after schedule().

I still can't understand the problem.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/