Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage

From: Cong Wang
Date: Wed Apr 05 2017 - 19:56:30 EST


On Tue, Apr 4, 2017 at 11:12 PM, Mike Galbraith <efault@xxxxxx> wrote:
> On Tue, 2017-04-04 at 22:25 -0700, Cong Wang wrote:
>> On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith <efault@xxxxxx> wrote:
>> > - while (some_qdisc_is_busy(dev))
>> > - yield();
>> > + swait_event_timeout(swait,
>> > !some_qdisc_is_busy(dev), 1);
>> > }
>>
>> I don't see why this is an improvement even if I don't care about the
>> hardcoded timeout for now... Why the scheduler can make a better
>> decision with swait_event_timeout() than with cond_resched()?
>
> Because sleeping gets you out of the way? There is no other decision
> the scheduler can make while a SCHED_FIFO task is trying to yield when
> it is the one and only task at it's priority. The scheduler is doing
> exactly what it is supposed to do, problem is people calling yield()
> tend to think it does something it does not do, which is why it is
> decorated with "if you think you want yield(), think again"
>
> Yes, yield semantics suck rocks, basically don't exist. Hop in your
> time machine and slap whoever you find claiming responsibility :)

I am not trying to defend for yield(), I am trying to understand when
cond_resched() is not a right solution to replace yield() and when it is.
For me, the dev_deactivate_many() case is, because I interpret
"be nice" differently.

Thanks.