[RFC PATCH v2 0/2] ext4: speed up fast commit on random writes

From: Daejun Park

Date: Tue Jun 23 2026 - 04:23:39 EST

Fast commit is meant to make fsync cheap, but on random-write workloads it
defeats itself. ext4 still tracks a single coalesced [min,max] logical range
per inode (i_fc_lblk_start/len). When an inode is dirtied at several disjoint
offsets between two commits, that span widens to cover them all, and at commit
time ext4_fc_snapshot_inode_data() walks the whole span through the extent
status tree -- emitting an ADD_RANGE per mapped segment and a DEL_RANGE per
hole inside it. For scattered writes that is hundreds to thousands of ranges
even though only a handful of regions were actually modified.

The recently merged fast-commit snapshot work did not change this: it caps a
snapshot at EXT4_FC_SNAPSHOT_MAX_RANGES (2048) and fails over to a full commit
when the span exceeds it. Measured on dev (7.1.0-rc4) with a sparse
random-write workload (1 GiB span, R disjoint 4 KiB writes per fsync, 300
fsyncs):

R=16 regions/fsync: ranges/commit 1095, full-commit fallback 76%
(snap_fail_ranges_cap on 226 of 300 fsyncs)

Fast commit barely functions on this workload.

This series tracks the actually-modified disjoint ranges instead of one span,
and snapshots only those:

1/2 replaces the single [min,max] range with a small, bounded set of sorted,
disjoint ranges (up to EXT4_FC_MAX_RANGES = 16; the two closest are
merged on overflow, so the worst case degrades to the old single-span
behaviour). ext4_fc_snapshot_inode_data() then walks only the tracked
ranges. The on-disk TLV format is unchanged.

2/2 allocates the range array lazily: the first range stays inline, the
array is allocated only when a second disjoint range appears, and on an
allocation failure we fall back to the inline single range. Per-inode
fast-commit footprint stays ~20 bytes.

Result on the same workload (dev, patched):

R=16: ranges/commit 1095 -> 16, fallback 76% -> 0.7%,
snap_fail_ranges_cap 226 -> 0

Testing (on dev, patched):
- crash recovery: deterministic writes + fsync, kill -9 QEMU (power loss),
reboot -> fast-commit replay -> verify every fsync'd block, e2fsck -fn.
9600/9600 blocks verified, 0 mismatch, e2fsck clean. Run with R=64, so the
overflow-merge path is exercised.
- ext4/generic fast-commit xfstests (ext4/044 ext4/045 generic/455 456 457
470 482): ext4/044, ext4/045, generic/456 pass; generic/457, 470 skip
(reflink/dax unsupported on ext4); generic/455 fails identically on
unpatched dev (pre-existing, patch-unrelated); generic/482's single failure
in a combined run did not reproduce (3/3 pass in isolation on the patched
kernel).

Changes since v1 [1]:
- Rebased from v6.17-rc3 onto ext4.git dev; re-implemented on top of the
merged fast-commit snapshot model (v1 targeted the old
ext4_fc_write_inode_data(), which no longer exists).

[1] v1: https://lore.kernel.org/linux-ext4/20260611044733epcms2p38013ae683a283555526f70e4eab6d2a9@epcms2p3/

Daejun Park (2):
ext4: fast commit: track disjoint modified ranges per inode
ext4: fast commit: allocate the range array lazily

fs/ext4/ext4.h | 42 ++++++--
fs/ext4/fast_commit.c | 219 +++++++++++++++++++++++++++++++++++-------
fs/ext4/super.c | 1 +
3 files changed, 222 insertions(+), 40 deletions(-)

base-commit: c143957520c6c9b5cd72e0de8b52b814f0c576fe
--
2.43.0