Re: [PATCH 00/24] Complete EEVDF

From: Peter Zijlstra
Date: Tue Sep 10 2024 - 10:07:56 EST


On Tue, Sep 10, 2024 at 02:21:05PM +0200, Sven Schnelle wrote:
> Sven Schnelle <svens@xxxxxxxxxxxxx> writes:
>
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> >
> >> Hi all,
> >>
> >> So after much delay this is hopefully the final version of the EEVDF patches.
> >> They've been sitting in my git tree for ever it seems, and people have been
> >> testing it and sending fixes.
> >>
> >> I've spend the last two days testing and fixing cfs-bandwidth, and as far
> >> as I know that was the very last issue holding it back.
> >>
> >> These patches apply on top of queue.git sched/dl-server, which I plan on merging
> >> in tip/sched/core once -rc1 drops.
> >>
> >> I'm hoping to then merge all this (+- the DVFS clock patch) right before -rc2.
> >>
> >>
> >> Aside from a ton of bug fixes -- thanks all! -- new in this version is:
> >>
> >> - split up the huge delay-dequeue patch
> >> - tested/fixed cfs-bandwidth
> >> - PLACE_REL_DEADLINE -- preserve the relative deadline when migrating
> >> - SCHED_BATCH is equivalent to RESPECT_SLICE
> >> - propagate min_slice up cgroups
> >> - CLOCK_THREAD_DVFS_ID
> >
> > I'm seeing crashes/warnings like the following on s390 with linux-next 20240909:
> >
> > Sometimes the system doesn't manage to print a oops, this one is the best i got:
> >
> > [..]
> > This happens when running the strace test suite. The system normaly has
> > 128 CPUs. With this configuration the crash doesn't happen, but when
> > disabling all but four CPUs and running 'make check -j16' in the strace
> > test suite the crash is almost always reproducable.

I noted: Comm: prctl-sched-cor, which is testing core scheduling, right?

Only today I;ve merged a fix for that:

c662e2b1e8cf ("sched: Fix sched_delayed vs sched_core")

Could you double check if merging tip/sched/core into your next tree
helps anything at all?