Re: INFO: rcu detected stall in do_idle

From: Juri Lelli
Date: Tue Oct 16 2018 - 10:41:36 EST


On 16/10/18 16:03, Peter Zijlstra wrote:
> On Tue, Oct 16, 2018 at 03:24:06PM +0200, Thomas Gleixner wrote:
> > It does reproduce here but with a kworker stall. Looking at the reproducer:
> >
> > *(uint32_t*)0x20000000 = 0;
> > *(uint32_t*)0x20000004 = 6;
> > *(uint64_t*)0x20000008 = 0;
> > *(uint32_t*)0x20000010 = 0;
> > *(uint32_t*)0x20000014 = 0;
> > *(uint64_t*)0x20000018 = 0x9917;
> > *(uint64_t*)0x20000020 = 0xffff;
> > *(uint64_t*)0x20000028 = 0;
> > syscall(__NR_sched_setattr, 0, 0x20000000, 0);
> >
> > which means:
> >
> > struct sched_attr {
> > .size = 0,
> > .policy = 6,
> > .flags = 0,
> > .nice = 0,
> > .priority = 0,
> > .deadline = 0x9917,
> > .runtime = 0xffff,
> > .period = 0,
> > }
> >
> > policy 6 is SCHED_DEADLINE
> >
> > That makes the thread hog the CPU and prevents all kind of stuff to run.
> >
> > Peter, is that expected behaviour?
>
> Sorta, just like FIFO-99 while(1);. Except we should be rejecting the
> above configuration, because of the rule:
>
> runtime <= deadline <= period
>
> Juri, where were we supposed to check that?

Not if period == 0.

https://elixir.bootlin.com/linux/latest/source/kernel/sched/deadline.c#L2632
https://elixir.bootlin.com/linux/latest/source/kernel/sched/deadline.c#L2515

Now, maybe we should be checking also against the default 95% cap?

Best,

- Juri