Re: Linux warns `sched: DL replenish lagged too much`

From: Paul Menzel
Date: Fri Feb 21 2025 - 17:14:07 EST


Dear Juri,


Am 20.02.25 um 21:18 schrieb Paul Menzel:

Am 20.02.25 um 14:35 schrieb Juri Lelli:

On 20/02/25 11:47, Paul Menzel wrote:

On the Intel Kaby Lake laptop Dell XPS 13 with Linux
6.14.0-rc3-00060-g6537cfb395f3, waking it up from ACPI S3 with an LMP USB-C
mini dock connected, that had an Ethernet cable and a power adapter plugged
in, everything was lagging, and also the video in the opened Firefox Nightly
browser lagged quite a bit. This has happened in the past, but not that bad
and long. Today for the first time, Linux logged the warning below:

     sched: DL replenish lagged too much

(This is from `kernel/sched/deadline.c`.)

I have no idea, if it’s related to the hardware itself, that causes it to
lag, that a suspend/resume cycle fixes, or if it’s related to the USB-C
controller that has bugs in that early generation, or if it’s related to
GNOME/Mutter (*mutter-common* 48~beta-3) or Firefox or the Web video player
used by the site.

As often the case with this, I have no way to reliably reproduce it, and in
this case to reproduce the warning. I can only say, that this warning has
not been logged in the available log files since September 2024. Linux
“6.11-rc0” was used then. Please find the log messages attached.

In case this information is not useful, should this happen again, it’d be
great if you could suggest what and how I should collect debugging
information next time.

Assuming no explicit usage of SCHED_DEADLINE, I would say the warning
message might be related to the recently introduced deadline servers:
5f6bd380c7bd ("sched/rt: Remove default bandwidth control") and related
commits.

They were merged in v6.12 (IIRC), though,

Indeed, it first appeared in v6.12-rc1.

so I would expect you had noticed already before if they introduced
issues on your setup? That said, it might also be the case that
something else changed more recently that now triggers a corner
case.

On this device, I sometimes experience lags after resuming from ACPI S3, and, although I cannot prove it, using USB-C mini docks (LMP USB-C mini Dock [1] or Dell DA300) seems to increase the chances of hitting the problem. Re-plugging and doing ACPI S3 suspend/resume again often helps.

I guess, I didn’t see this yet, as I haven’t been using the USB-C charger with the USB-C mini dock up until now as the laptop also has a barel jack for charging.

The warning message per-se it's not fatal, it just warns that the kernel
is recovering from an unexpected situation. Did you notice that things
went back to normal (no lag from a user perspective) right after that
message was printed?

As far as I remember, the lagging stayed even after the log was printed. In this case, I had to reboot the device, and until now – with no ACPI S3 suspend/resume – it works fine.

I was able to reproduce this today. The same behavior after resuming from ACPI S3.

Feb 21 17:20:19 abreu gnome-shell[1775]: meta_wayland_buffer_process_damage: assertion 'buffer->resource' failed
Feb 21 17:20:19 abreu gnome-shell[1775]: (../src/wayland/meta-wayland-buffer.c:709):meta_wayland_buffer_inc_use_count: runtime check failed: (buffer->resource)
Feb 21 17:20:21 abreu rtkit-daemon[1017]: Supervising 7 threads of 5 processes of 1 users.
Feb 21 17:20:21 abreu rtkit-daemon[1017]: Supervising 7 threads of 5 processes of 1 users.
Feb 21 17:20:59 abreu kernel: sched: DL replenish lagged too much
Feb 21 17:21:01 abreu systemd[1588]: Started app-gnome-ptyxis-26577.scope - Application launched by gsd-media-keys.
[…]
Feb 21 17:23:43 abreu gnome-shell[1775]: libinput error: event11 - DLL075B:01 06CB:76AF Touchpad: client bug: event processing lagging behind by 31ms, your system is too slow

Move lags sometimes, and also switching GNOME virtual desktops, or just opening a new window takes several seconds.

Can I set an option so more stuff is logged, once the scheduler warning is logged?


Kind regards,

Paul


[1]: https://lmp-adapter.com/product/lmp-usb-c-mini-dock/