Re: [PATCH 00/24] Complete EEVDF

From: Sven Schnelle
Date: Tue Sep 10 2024 - 10:53:34 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> On Tue, Sep 10, 2024 at 02:21:05PM +0200, Sven Schnelle wrote:
>> Sven Schnelle <svens@xxxxxxxxxxxxx> writes:
>>
>> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>> >
>> >> Hi all,
>> >>
>> >> So after much delay this is hopefully the final version of the EEVDF patches.
>> >> They've been sitting in my git tree for ever it seems, and people have been
>> >> testing it and sending fixes.
>> >>
>> >> I've spend the last two days testing and fixing cfs-bandwidth, and as far
>> >> as I know that was the very last issue holding it back.
>> >>
>> >> These patches apply on top of queue.git sched/dl-server, which I plan on merging
>> >> in tip/sched/core once -rc1 drops.
>> >>
>> >> I'm hoping to then merge all this (+- the DVFS clock patch) right before -rc2.
>> >>
>> >>
>> >> Aside from a ton of bug fixes -- thanks all! -- new in this version is:
>> >>
>> >> - split up the huge delay-dequeue patch
>> >> - tested/fixed cfs-bandwidth
>> >> - PLACE_REL_DEADLINE -- preserve the relative deadline when migrating
>> >> - SCHED_BATCH is equivalent to RESPECT_SLICE
>> >> - propagate min_slice up cgroups
>> >> - CLOCK_THREAD_DVFS_ID
>> >
>> > I'm seeing crashes/warnings like the following on s390 with linux-next 20240909:
>> >
>> > Sometimes the system doesn't manage to print a oops, this one is the best i got:
>> >
>> > [..]
>> > This happens when running the strace test suite. The system normaly has
>> > 128 CPUs. With this configuration the crash doesn't happen, but when
>> > disabling all but four CPUs and running 'make check -j16' in the strace
>> > test suite the crash is almost always reproducable.
>
> I noted: Comm: prctl-sched-cor, which is testing core scheduling, right?
>
> Only today I;ve merged a fix for that:
>
> c662e2b1e8cf ("sched: Fix sched_delayed vs sched_core")
>
> Could you double check if merging tip/sched/core into your next tree
> helps anything at all?

Yes, that fixes the issue. Thanks!