Re: kernel BUG at kernel/sched/core.c:3490!

From: Oleg Nesterov
Date: Mon Jan 07 2019 - 12:56:50 EST


On 01/07, Qian Cai wrote:
>
>
> On 1/7/19 8:52 AM, Peter Zijlstra wrote:
> > On Tue, Jan 01, 2019 at 12:44:35AM -0500, Qian Cai wrote:
> >> Running some mmap() workloads to put the system on low memory situation with
> >> swapping and OOM, and then it trigger this BUG(),
> >>
> >> void __noreturn do_task_dead(void)
> >> {
> >> /* Causes final put_task_struct in finish_task_switch(): */
> >> set_special_state(TASK_DEAD);
> >>
> >> /* Tell freezer to ignore us: */
> >> current->flags |= PF_NOFREEZE;
> >>
> >> __schedule(false);
> >> BUG();
> >>
> >> /* Avoid "noreturn function does return" - but don't continue if BUG()
> >> is a NOP: */
> >> for (;;)
> >> cpu_relax();
> >> }
> >
> > This would mean that we somehow loose the TASK_DEAD state before hitting
> > schedule(), but that is something that should be avoided by
> > set_special_state(), which is supposed to serialize against concurrent
> > wake-ups.

or may be pick_next_task() somehow returns the deactivated TASK_DEAD task?

> > How readily does this reproduce?
>
> Running LTP oom01 [1] triggered it at least once in five attempts every time so
> far on v4.20+. Have not tried much on v5.0-rc1 yet.

Can you add

pr_crit("XXX: %ld %d\n", current->state, current->on_rq);

before that BUG() and reproduce?

Oleg.