Re: [PATCH] ext4: rralloc - (former rotalloc) improved round-robin allocation policy

From: Theodore Tso

Date: Wed Mar 04 2026 - 21:48:24 EST


On Tue, Mar 03, 2026 at 02:28:47PM +0100, Mario Lohajner wrote:
> RRALLOC targets sustained parallel overwrite-heavy workloads such as
> scratch disks, rendering outputs, database storage and VM image storage.
> ....
> It is not intended to improve write-once/read-many workloads and remains
> disabled by default.

First of all, databases and VM images are use cases which are almost
the definition of write-once / read-many workloads.

As far as your first two examples, I've been part of teams that have
built storage systems for scratch disks and rendering outputs at an
extremely large scale, at *extremely* large scale. (A public estimate
from 2013[1], for which I make no comments about how accurate it was
back then, but it's fair to say that there have been at least a few more
data centers built since then; also, disks and SSD have gotten
somewhat more efficient from storage density since them. :-)

[1] https://what-if.xkcd.com/63/

Having built and supported systems for these first two use cases, I
can quite confidentially tell you that the problem that you are
trying to solve for weren't even *close* to real world issues that we
had to overcome.

Now, it may be that you are doing some very different (or perhaps very
dumb; I can't say given how few details you've given). But what
you've described is so vague and scatter-shot that it could have come
from the output of a very Large Language Model given a very sloppily
written prompt. (In other words, what is commonlly called "AI Slop".)

If you want to be convincing, you'll need to give a lot more specific
detail about the nature of the workloads. How many Petabytes (or
whatever the appropriate unit in your case) per hour of data is being
written? What kind of storage devices are you using? How many are
you using? Attached to how many servers? How many files are being
written in parallel? At what throughput rate?

When you use stock ext4 for this workload, what are you seeing? What
sort of benchmarking did you use to convince yourself that the
bottleneck is indeed block allocation algorithm. What kind of
percentage increase did your replacement algorithm have for this
specific workload.

If you want to see examples of well-written papers of various
performance improvements, I will refer you to papers from Usenix's
File System and Storage Technologies conference[2] for examples of how
to write a convincing paper when you're not free to share *all* of the
details of the workload, or the specific storage devices that you are
using. The problem is right now, you've shared nothing about your
specific workload.

[2] https://www.usenix.org/conferences/byname/146

Cheers,

- Ted