Re: INFO: rcu detected stall in do_idle

From: Juri Lelli
Date: Thu Oct 18 2018 - 04:28:49 EST


On 16/10/18 16:03, Peter Zijlstra wrote:
> On Tue, Oct 16, 2018 at 03:24:06PM +0200, Thomas Gleixner wrote:
> > It does reproduce here but with a kworker stall. Looking at the reproducer:
> >
> > *(uint32_t*)0x20000000 = 0;
> > *(uint32_t*)0x20000004 = 6;
> > *(uint64_t*)0x20000008 = 0;
> > *(uint32_t*)0x20000010 = 0;
> > *(uint32_t*)0x20000014 = 0;
> > *(uint64_t*)0x20000018 = 0x9917;
> > *(uint64_t*)0x20000020 = 0xffff;
> > *(uint64_t*)0x20000028 = 0;
> > syscall(__NR_sched_setattr, 0, 0x20000000, 0);
> >
> > which means:
> >
> > struct sched_attr {
> > .size = 0,
> > .policy = 6,
> > .flags = 0,
> > .nice = 0,
> > .priority = 0,
> > .deadline = 0x9917,
> > .runtime = 0xffff,
> > .period = 0,
> > }
> >
> > policy 6 is SCHED_DEADLINE
> >
> > That makes the thread hog the CPU and prevents all kind of stuff to run.
> >
> > Peter, is that expected behaviour?
>
> Sorta, just like FIFO-99 while(1);. Except we should be rejecting the
> above configuration, because of the rule:
>
> runtime <= deadline <= period
>
> Juri, where were we supposed to check that?

OK, looks like the "which means" part above had me fooled, as
we actually have ([1], where the comment is wrong)

struct sched_attr {
.size = 0,
.policy = 6,
.flags = 0,
.nice = 0,
.priority = 0,
.runtime = 0x9917,
.deadline = 0xffff,
.period = 0,
}

So, we seem to be correctly (in theory, see below) accepting the task.

What seems to generate the problem here is that CONFIG_HZ=100 and
reproducer task has "tiny" runtime (~40us) and deadline (~66us)
parameters, combination that "bypasses" the enforcing mechanism
(performed at each tick).

Another side problem seems also to be that with such tiny parameters we
spend lot of time in the while (dl_se->runtime <= 0) loop of replenish_dl_
entity() (actually uselessly, as deadline is most probably going to
still be in the past when eventually runtime becomes positive again), as
delta_exec is huge w.r.t. runtime and runtime has to keep up with tiny
increments of dl_runtime. I guess we could ameliorate things here by
limiting the number of time we execute the loop before bailing out.

Enabling HRTICK makes a difference [2]. I played a bit with several
combinations and could verify that parameters in the ~50us range seem
usable. However, still to mention that when runtime gets close to
deadline (very high bandwidth) enforcing could be tricked again, as
hrtick overheads might make the task effectively executing for more than
the runtime, over passing the replenish instant (old deadline), so
replenish timer is not set, and letting the task continuing executing
after a replenishment.

This is all however very much platform and config dependent, of course.

So, I tend to think that we might want to play safe and put some higher
minimum value for dl_runtime (it's currently at 1ULL << DL_SCALE).
Guess the problem is to pick a reasonable value, though. Maybe link it
someway to HZ? Then we might add a sysctl (or similar) thing with which
knowledgeable users can do whatever they think their platform/config can
support?

Thoughts?

I'm adding more people on Cc as I'm not sure they are following this.
Thread starts here [3].

1 - https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/sched/types.h#L70
2 - noticed that we don't actually start hrtick on setup_new_dl_entity()
and think we should
3 - https://lore.kernel.org/lkml/000000000000a4ee200578172fde@xxxxxxxxxx/