Re: [PATCH 2/2] bcachefs: set rebalance thread to SCHED_BATCH and nice 19

From: Florian Schmaus
Date: Tue Jan 14 2025 - 11:45:48 EST

On 14/01/2025 16.25, Kent Overstreet wrote:
On Tue, Jan 14, 2025 at 03:32:14PM +0100, Peter Zijlstra wrote:
On Tue, Jan 14, 2025 at 01:47:28PM +0100, Florian Schmaus wrote:
While the rebalance thread is isually not compute bound, it does cause
a considerable amount of I/O. Since "reducing" the nice level from 0
to 19, also implicitly reduces the threads best-effort I/O scheduling
class level from 4 to 7, the reblance thread's I/O will be depriotized
over normal I/O.

Furthermore, we set the rebalance thread's scheduling class to BATCH,
which means that it will potentially receive a higher scheduling
latency. Making room for threads that need a low
schedulinglatency (e.g., interactive onces).

sorta.. what worries me most about these patches are the claims without
backing numbers.

Supposedly there is a problem, and this here fixes it, but it doesn't
really get quantified much here.

I am sorry, Peter; I know that changes should be motivated by some data, but I unfortunately don't have any in this case.

As you wrote, the difference between BATCH and NORMAL tasks is that the former will not immediately kick a running task from the CPU.

With that in mind, it made sense that janitorial tasks running in the background and not requiring a low scheduling latency should run under BATCH (and not NORMAL). Bcachefs' rebalance thread is a prime example of such a task.

Additionally, I believe, but please correct me if I am wrong, that tasks using BATCH instead of NORMAL grant the scheduler more flexibility to provide scheduling-latency-sensitive tasks with lower latency. But you are right, I should have made some experiments if this is really the case.

yeah, it was explained to me and made sense at the time, but things
somehow keep falling out of my overflowing brain.

Florian, could you update the patch message with that? Was it intended
as a partial workaround for the rebalance spinning issue some users have
been hitting?

I did not run into that issue myself, but it probably would help somewhat mitigate the effects of the periods during which the rebalance task is CPU bound.

- Florian

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature