Re: SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)
From: luca abeni
Date: Fri May 02 2025 - 10:10:54 EST
Hi all,
On Fri, 2 May 2025 15:55:42 +0200
Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
> Hi Marcel,
>
> On 28/04/25 20:04, Marcel Ziswiler wrote:
> > Hi
> >
> > As part of our trustable work [1], we also run a lot of real time
> > scheduler (SCHED_DEADLINE) tests on the mainline Linux kernel.
> > Overall, the Linux scheduler proves quite capable of scheduling
> > deadline tasks down to a granularity of 5ms on both of our test
> > systems (amd64-based Intel NUCs and aarch64-based RADXA ROCK5Bs).
> > However, recently, we noticed a lot of deadline misses if we
> > introduce overrunning jobs with reclaim mode enabled
> > (SCHED_FLAG_RECLAIM) using GRUB (Greedy Reclamation of Unused
> > Bandwidth). E.g. from hundreds of millions of test runs over the
> > course of a full week where we usually see absolutely zero deadline
> > misses, we see 43 million deadline misses on NUC and 600 thousand
> > on ROCK5B (which also has double the CPU cores). This is with
> > otherwise exactly the same test configuration, which adds exactly
> > the same two overrunning jobs to the job mix, but once without
> > reclaim enabled and once with reclaim enabled.
> >
> > We are wondering whether there are any known limitations to GRUB or
> > what exactly could be the issue.
> >
> > We are happy to provide more detailed debugging information but are
> > looking for suggestions how/what exactly to look at.
>
> Could you add details of the taskset you are working with? The number
> of tasks, their reservation parameters (runtime, period, deadline)
> and how much they are running (or trying to run) each time they wake
> up. Also which one is using GRUB and which one maybe is not.
>
> Adding Luca in Cc so he can also take a look.
Thanks for cc-ing me, Jury!
Marcel, are your tests on a multi-core machine with global scheduling?
If yes, we should check if the taskset is schedulable.
Thanks,
Luca