Re: [BISECTED] rcu_sched self-detected stall since 3.17
From: Oleg Nesterov
Date: Tue Dec 15 2015 - 11:56:34 EST
Sorry again for the huge delay.
And all I can say is that I am all confused.
On 12/01, Peter Zijlstra wrote:
>
> On Fri, Nov 20, 2015 at 03:35:38PM +0000, Vladimir Murzin wrote:
> > commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
> > Author: NeilBrown <neilb@xxxxxxx>
> > Date: Mon Jul 7 15:16:04 2014 +1000
That patch still looks correct to me.
> > and if I apply following diff I don't see stalls anymore.
> >
> > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> > index a104879..2d68cdb 100644
> > --- a/kernel/sched/wait.c
> > +++ b/kernel/sched/wait.c
> > @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);
> >
> > __sched int bit_wait_io(void *word)
> > {
> > + io_schedule();
> > +
> > if (signal_pending_state(current->state, current))
> > return 1;
> > - io_schedule();
> > return 0;
> > }
> > EXPORT_SYMBOL(bit_wait_io);
I can't understand why this change helps. But note that it actually removes
the signal_pending_state() check from bit_wait_io(), current->state is always
TASK_RUNNING after return from schedule(), signal_pending_state() will always
return zero.
This means that after this change wait_on_page_bit_killable() will spin in a
busy-wait loop if the caller is killed.
> The reason this is broken is that schedule() will no-op when there is a
> pending signal, while raising a signal will also issue a wakeup.
But why this is wrong? We should notice signal_pending_state() on the next
iteration.
> Thus the right thing to do is check for the signal state after,
I think this check should work on both sides. The only difference is that
you obviously can't use current->state after schedule().
I still can't understand the problem.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/