Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO failure

From: Zhang Yi

Date: Wed Mar 25 2026 - 10:42:31 EST

Hi, Diangang,

On 3/25/2026 7:13 PM, Diangang Li wrote:

Hi Andreas,

BH_Read_EIO is cleared on successful read or write.

I think what Andreas means is, since you modified the ext4_read_bh() interface, if the bh to be read already has the Read_EIO flag set, then subsequent read operations through this interface will directly return failure without issuing a read I/O. At the same time, because its state is also not uptodate, for an existing block, a write request will not be issued either. How can we clear this Read_EIO flag? IIRC, relying solely on ext4_read_bh_nowait() doesn't seem sufficient to achieve this.

Thanks,
Yi.

In practice bad blocks are typically repaired/remapped on write, so we
expect recovery after a successful rewrite. If the block is never
rewritten, repeatedly issuing the same failing read does not help.

We clear the flag on successful reads so the buffer can recover
immediately if the error was transient. Since read-ahead reads are not
blocked, a later successful read-ahead will clear the flag and allow
subsequent synchronous readers to proceed normally.

Best,
Diangang

On 3/25/26 6:15 PM, Andreas Dilger wrote:

On Mar 25, 2026, at 03:33, Diangang Li <diangangli@xxxxxxxxx> wrote:

From: Diangang Li <lidiangang@xxxxxxxxxxxxx>

ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,
the buffer remains !Uptodate. With concurrent callers, each waiter can
retry the same failing read after the previous holder drops BH_Lock. This
amplifies device retry latency and may trigger hung tasks.

In the normal read path the block driver already performs its own retries.
Once the retries keep failing, re-submitting the same metadata read from
the filesystem just amplifies the latency by serializing waiters on
BH_Lock.

Remember read failures on buffer_head and fail fast for ext4 metadata reads
once a buffer has already failed to read. Clear the flag on successful
read/write completion so the buffer can recover. ext4 read-ahead uses
ext4_read_bh_nowait(), so it does not set the failure flag and remains
best-effort.

Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer
and it prevents other tasks from reading that block again, how would the
buffer ever become Uptodate to clear the flag? There isn't enough state
in a 1-bit flag to have any kind of expiry and later retry.

Cheers, Andreas