[PATCH 0/1] stop the unbound recursion in preempt_schedule_context()

From: Oleg Nesterov
Date: Sun Oct 05 2014 - 16:26:41 EST


On 10/04, Oleg Nesterov wrote:
>
> On 10/03, Linus Torvalds wrote:
> >
> > On Fri, Oct 3, 2014 at 5:01 PM, Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > The real fix would appear to be to use
> > > "preempt_enable_no_resched_notrace()", which your patch did, but
> > > without the loop.
> >
> > Actually, the real fix would be to not be stupid, and just make the
> > code do something like
> >
> > > if (likely(!preemptible()))
> > > return;
> > >
> > > __preempt_count_add(PREEMPT_ACTIVE);
> > > prev_ctx = exception_enter();
> > >
> > > __schedule();
> > >
> > > exception_exit(prev_ctx);
> > > __preempt_count_sub(PREEMPT_ACTIVE);
> >
> > and *not* enable preemption around the scheduling at all.

Yes, I think you are right. And I hate to admit that I didn't think about
this simplification. The only complication is that we should move this
function from context_tracking.c to sched/core.c.

However, I still think we need the need_resched() loop, we can't do this
only once.

And in fact the simplest fix could just turn it into

local_irq_disable();
preempt_schedule_irq();
local_irq_enable();

> Again, it is too late for me... Most probably I am wrong, but somehow
> it seems to me that the real fix should try to kill preempt_schedule_context()
> altogether and teach preempt_schedule() to play well with CONTEXT_TRACKING.

Yes, the very fact that

preempt_enable() != trace_preempt_on() + preempt_enable_notrace()

looks simply wrong imo. And preempt_enable_notrace() has users outside of
the tracing code which can run in IN_USER state. For example, why should
__perf_sw_event() worry about context_tracking.state?

OTOH, if the caller of preempt_enable_notrace() actually needs to take care
of potential IN_USER state, then it probably has other problems. And indeed,
ftrace_ops_control_func() has to check rcu_is_watching() anyway.

IOW, imho 29bb9e5a75 "tracing/context-tracking: Add preempt_schedule_context()
for tracing" was not a right solution. Perhaps the tracing functions can be
changed to switch to IN_KERNEL mode, or at least perhaps we can add the
special preempt_disable_ftrace() helper. But this needs another discussion.

Either way, I do think that preempt_schedule_context() in its current form
must die.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/