On 14.04.25 14:04, Aaron Lu wrote:
Hi Florian,
On Mon, Apr 14, 2025 at 10:54:48AM +0200, Florian Bezdeka wrote:
Hi Aaron, Hi Valentin,
On Wed, 2025-04-09 at 20:07 +0800, Aaron Lu wrote:
This is a continuous work based on Valentin Schneider's posting here:
Subject: [RFC PATCH v3 00/10] sched/fair: Defer CFS throttle to user entry
https://lore.kernel.org/lkml/20240711130004.2157737-1-vschneid@xxxxxxxxxx/
Valentin has described the problem very well in the above link. We also
have task hung problem from time to time in our environment due to cfs quota.
It is mostly visible with rwsem: when a reader is throttled, writer comes in
and has to wait, the writer also makes all subsequent readers wait,
causing problems of priority inversion or even whole system hung.
for testing purposes I backported this series to 6.14. We're currently
hunting for a sporadic bug with PREEMPT_RT enabled. We see RCU stalls
and complete system freezes after a couple of days with some container
workload deployed. See [1].
I tried to make a setup last week to reproduce the RT/cfs throttle
deadlock issue Valentin described but haven't succeeded yet...
Attached the bits with which we succeeded, sometimes. Setup: Debian 12,
RT kernel, 2-4 cores VM, 1-5 instances of the test, 2 min - 2 h
patience. As we have to succeed with at least 3 race conditions in a
row, that is still not bad... But maybe someone has an idea how to
increase probabilities further.
Jan