Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user

From: Jan Kiszka
Date: Tue Apr 15 2025 - 01:30:10 EST


On 14.04.25 14:04, Aaron Lu wrote:
> Hi Florian,
>
> On Mon, Apr 14, 2025 at 10:54:48AM +0200, Florian Bezdeka wrote:
>> Hi Aaron, Hi Valentin,
>>
>> On Wed, 2025-04-09 at 20:07 +0800, Aaron Lu wrote:
>>> This is a continuous work based on Valentin Schneider's posting here:
>>> Subject: [RFC PATCH v3 00/10] sched/fair: Defer CFS throttle to user entry
>>> https://lore.kernel.org/lkml/20240711130004.2157737-1-vschneid@xxxxxxxxxx/
>>>
>>> Valentin has described the problem very well in the above link. We also
>>> have task hung problem from time to time in our environment due to cfs quota.
>>> It is mostly visible with rwsem: when a reader is throttled, writer comes in
>>> and has to wait, the writer also makes all subsequent readers wait,
>>> causing problems of priority inversion or even whole system hung.
>>
>> for testing purposes I backported this series to 6.14. We're currently
>> hunting for a sporadic bug with PREEMPT_RT enabled. We see RCU stalls
>> and complete system freezes after a couple of days with some container
>> workload deployed. See [1]. 
>
> I tried to make a setup last week to reproduce the RT/cfs throttle
> deadlock issue Valentin described but haven't succeeded yet...
>

Attached the bits with which we succeeded, sometimes. Setup: Debian 12,
RT kernel, 2-4 cores VM, 1-5 instances of the test, 2 min - 2 h
patience. As we have to succeed with at least 3 race conditions in a
row, that is still not bad... But maybe someone has an idea how to
increase probabilities further.

Jan

--
Siemens AG, Foundational Technologies
Linux Expert Center#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>

int main(int argc, char *argv[])
{
int pipe, timerfd, epoll;
struct epoll_event ev[2];
struct itimerspec it;
int ret;

assert(argc == 2);
pipe = open(argv[1], O_RDONLY);
assert(pipe >= 0);

timerfd = timerfd_create(CLOCK_MONOTONIC, 0);
assert(timerfd >= 0);
it.it_value.tv_sec = 0;
it.it_value.tv_nsec = 1;
it.it_interval.tv_sec = 0;
it.it_interval.tv_nsec = 50000;
ret = timerfd_settime(timerfd, 0, &it, NULL);
assert(ret == 0);

epoll = epoll_create1(0);
assert(epoll >= 0);

ev[0].events = EPOLLIN;
ev[0].data.fd = pipe;
ret = epoll_ctl(epoll, EPOLL_CTL_ADD, pipe, &ev[0]);
assert(ret == 0);

ev[1].events = EPOLLIN;
ev[1].data.fd = timerfd;
ret = epoll_ctl(epoll, EPOLL_CTL_ADD, timerfd, &ev[1]);
assert(ret == 0);

printf("starting loop\n");
while (1) {
struct epoll_event event;
char buffer[8];
size_t size;

ret = epoll_wait(epoll, &event, 1, -1);
assert(ret == 1);
if (event.data.fd == timerfd)
size = 8;
else
size = 1;
ret = read(event.data.fd, buffer, size);
assert(ret == size);
}
}
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int pipe, ret;

assert(argc == 2);
pipe = open(argv[1], O_WRONLY);
assert(pipe >= 0);

printf("starting writer\n");
while (1) {
ret = write(pipe, "x", 1);
assert(ret == 1);
}
}

Attachment: run.sh
Description: application/shellscript