Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

From: Marcel Ziswiler
Date: Mon Dec 02 2024 - 14:25:30 EST


Sorry for my late reply, I was traveling back from Manchester to Switzerland but I am all settled down again.

On Fri, 2024-11-29 at 10:08 +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 12:37:14PM +0100, Marcel Ziswiler wrote:
>
> > > Oooh, that's something. So far the few reports have not been (easily)
> > > reproducible. If this is readily reproducible on arm64 that would
> > > help a lot. Juri, do you have access to an arm64 test box?
> >
> > As mentioned above, so far our scheduler stress test is not yet open source but Codethink is eager to share
> > anything which helps in resolving this.
>
> I was hoping you could perhaps share a binary with Juri privately or
> with RHT (same difference etc), such that he can poke at it too.

Sure, there is nothing secret about it, it is just that we have not gotten around open sourcing all parts of it
just yet.

The UEFI aarch64 embedded Linux image I am using may be found here [1]. Plus matching bmap file should you
fancy using that [2]. And the SSH key may help when interacting with the system (e.g. that is how I trigger the
failure as the console is quite busy with tracing) [3]. However, that was built by CI and does not contain a
kernel with below patch applied yet. I manually dumped the kernel config and compiled v6.12.1 with your patch
applied and deployed it (to /lib/modules, /usr/lib/kernel et. al.) in the below case where I provide the dump.

> Anyway, if you don't mind a bit of back and forth, 

Sure.

> would you mind adding
> the below patch to your kernel and doing:
>
> (all assuming your kernel has ftrace enabled)
>
>   echo 1 > /sys/kernel/debug/tracing/options/stacktrace
>   echo 1 > /proc/sys/kernel/traceoff_on_warning
>
> running your test to failure and then dumping the trace into a file
> like:
>
>   cat /sys/kernel/debug/tracing/trace > ~/trace

Unfortunately, once I trigger the failure the system is completely dead and won't allow me to dump the trace
buffer any longer. So I did the following instead on the serial console terminal:

tail -f /sys/kernel/debug/tracing/trace

Not sure whether there is any better way to go about this. Plus even so we run the serial console at 1.5
megabaud I am not fully sure whether it was able to keep up logging what you are looking for.

> Then compress the file (bzip2 or whatever is popular these days)

xz or zstd (;-p)

> and
> send it my way along with a dmesg dump (private is fine -- these things
> tend to be large-ish).

As mentioned before, there is nothing secret about it. Please find it here [4].

> Hopefully, this will give us a little clue as to where the double
> enqueue happens.

Yes, and do not hesitate to ask for any additional information et. al. we are happy to help. Thanks!

> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index d9d5a702f1a6..b9cd9b40a19f 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1203,6 +1203,11 @@ static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_
>   scoped_guard (rq_lock, rq) {
>   struct rq_flags *rf = &scope.rf;
>  
> + if (dl_se == &rq->fair_server) {
> + trace_printk("timer fair server %d throttled %d\n",
> +      cpu_of(rq), dl_se->dl_throttled);
> + }
> +
>   if (!dl_se->dl_throttled || !dl_se->dl_runtime)
>   return HRTIMER_NORESTART;
>  
> @@ -1772,6 +1777,9 @@ static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer)
>   rq_lock(rq, &rf);
>   }
>  
> + if (dl_se == &rq->fair_server)
> + trace_printk("inactive fair server %d\n", cpu_of(rq));
> +
>   sched_clock_tick();
>   update_rq_clock(rq);
>  
> @@ -1967,6 +1975,12 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
>  static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
>  {
>   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> + struct rq *rq = rq_of_dl_se(dl_se);
> +
> + if (dl_se == &rq->fair_server) {
> + trace_printk("enqueue fair server %d h_nr_running %d\n",
> +      cpu_of(rq), rq->cfs.h_nr_running);
> + }
>  
>   WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node));
>  
> @@ -1978,6 +1992,12 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
>  static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
>  {
>   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> + struct rq *rq = rq_of_dl_se(dl_se);
> +
> + if (dl_se == &rq->fair_server) {
> + trace_printk("dequeue fair server %d h_nr_running %d\n",
> +      cpu_of(rq), rq->cfs.h_nr_running);
> + }
>  
>   if (RB_EMPTY_NODE(&dl_se->rb_node))
>   return;

[1] https://drive.codethink.co.uk/s/N8CQipaNNN45gYM

[2] https://drive.codethink.co.uk/s/mpcPawXpCjPL8D3

[3] https://drive.codethink.co.uk/s/8RjHNTQQRpYgaLc

[4] https://drive.codethink.co.uk/s/MWtzWjLDtdD3E5i

Cheers

Marcel