Re: [PATCH 0/5] sched: Lazy preemption muck
From: Thomas Gleixner
Date: Wed Oct 09 2024 - 19:16:40 EST
On Wed, Oct 09 2024 at 17:19, Steven Rostedt wrote:
> On Wed, 09 Oct 2024 23:06:00 +0200
> Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> For the transition phase we obviously need to do:
>>
>> while ($cond) {
>> spin_lock(L);
>> do_stuff();
>> spin_unlock(L);
>> cond_resched();
>> }
>
> But if $cond needs to be protected by spin_lock(), what then?
>
> spin_lock();
> while ($cond) {
> do_stuff();
> spin_unlock();
> spin_lock();
> }
> spin_unlock();
>
Seriously? The proper pattern for this is to do either:
while (READ_ONCE($cond)) {
scoped_guard(spinlock)(L) {
if ($cond)
break;
do_stuff();
}
cond_resched(); // To be removed
}
or in case that $cond is more complex and needs lock protection:
while (true) {
scoped_guard(spinlock)(L) {
if ($cond)
break;
do_stuff();
}
cond_resched(); // To be removed
}
You get the idea, no?
>> Seriously, this crap preserving mindset has to stop. If we go new ways
>> then we go them all the way.
>
> It's not about "crap preserving" but more of taking smaller steps. Then we
> can see where a regression happened if one does come up. Kind of like how
> you did the x86 64bit/32bit merge. Do steps that keep things as close to
> what they were at the start and slowly move toward your goals.
Correct. But if you want to do small steps then you have to look at all
the usage sites first and prepare them _before_ introducing the new
model. That's what I have done for the past 20 years.
The comparison to the 32/64bit merge is completely bogus because that
was just the purely mechanical collocation of the files to make it easy
to consolidate them afterwards. The consolidation was the real effort.
If you want a proper example then look at the CPU hotplug cleanup. There
was a large pile of preparatory patches before we even started to
convert to the state machine concept.
Look at all the other things we've done in the past 20 years of
refactoring to make RT possible. They all follow the same scheme:
1) Analyze _all_ usage sites, i.e. make an inventory
2) Define a migration path, i.e. come up with proper abstractions
3) Convert the usage sites over to the new abstractions
4) Replace the mechanics in the new abstractions
I certainly tried to short curcuit in the early days, but I learned
instantaneously that the short circuit approach is doomed especially
when you deal with changing the most fundamental parts of an OS.
Your idea of taking smaller steps is fundamentally flawed as it fails
to look at the bigger picture first and just tries to emulate the status
quo without doing the preparatory steps upfront.
Peter's approach is perfectly fine because it provides an opportunity,
which allows to do the analysis (#1) not only by inspection, but also by
observation, without being disruptive.
That seems to be the more painful approach, but I can assure you it's
less painful than the 'emulate crap' just to make progress approach.
Why?
It forces people to actually analyze the problems and not work around
them with yet another magic duct tape solution which is equally ill
defined as cond_resched() or the hideous might_sleep() hack.
The goal is to reduce technical debt and not to replace it with a
different variant.
Thanks,
tglx