Re: [RFC] block/nvme: exploring asynchronous durability notification semantics

From: Christoph Hellwig

Date: Tue Apr 07 2026 - 01:49:38 EST


On Thu, Apr 02, 2026 at 06:22:36PM -0300, Esteban Cerutti wrote:
> Today, a successful write completion indicates command execution,
> but not necessarily physical persistence to non-volatile media unless
> FUA or Flush is used. This forces the kernel and filesystems to assume
> worst-case durability behavior and rely on synchronous flushes and
> barriers for safety.

Nothing relies on synchronous flushes, and we killed barriers a long
time ago. FUA does as you say provide persistence notifications and
is heavily used for the (relatively rare) case where it matters.

> - Normal completion continues to signal execution.
> - The device assigns a persistence token ID.
> - When the data is physically committed to non-volatile media,
> the device emits an asynchronous durability confirmation
> referencing that token.
>
> This would decouple execution throughput from durability
> confirmation and potentially allow filesystems to close journal
> transactions only upon confirmed persistence, without forcing
> synchronous flush fences.

This is so complex that it's not going to work in practice.

You've also failed to explain where you think your model is actually
helping to improve clearly identifiable workloads. Note that all of
this would be limited to consumer hardware anyway, as volatile write
caches aren't really a thing for higher end hardware.