Re: [PATCH] mm/readahead: no PG_readahead on EOF
From: Frederick Mayle
Date: Tue May 12 2026 - 19:44:54 EST
On Fri, May 8, 2026 at 4:53 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 8 May 2026 11:12:31 -0700 Frederick Mayle <fmayle@xxxxxxxxxx> wrote:
>
> > When readahead pulls in all the remaining pages for a file, setting the
> > readahead bit is counter productive. The async readahead it would
> > trigger would almost certainly be a no-op. Additionally, for mmap'd file
> > IO, the readahead bit limits the fault around [0],
>
> There is no "[0]".
Sorry, that should be "[1]".
>
> > causing an extra
> > minor fault when the page is accessed.
> >
> > This was discovered when looking at /sys/kernel/tracing/events/readahead
> > traces for a simple program. With the patch applied, fewer
> > page_cache_ra_unbounded calls are observed.
> >
> > [1] do_fault_around calls filemap_map_pages, which finds eligible pages
> > by calling next_uptodate_folio [2]. next_uptodate_folio skips pages
> > with PG_readahead set [3].
> >
> > Link: https://github.com/torvalds/linux/blob/v7.0/mm/filemap.c#L3921-L3939 [2]
> > Link: https://github.com/torvalds/linux/blob/v7.0/mm/filemap.c#L3721-L3722 [3]
> > Cc: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> > Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > Signed-off-by: Frederick Mayle <fmayle@xxxxxxxxxx>
>
> AI review found a little race:
> https://sashiko.dev/#/patchset/20260508181237.670645-1-fmayle@xxxxxxxxxx
Tricky. My understanding is that previous code technically has the same issue
sashiko is pointing out. An expression like `min(limit, index + ra->size - 1)`
only mentions the field once but the compiler or hardware can still decide to
perform multiple loads from the field's address, resulting in incoherent
results.
I see from past discussions that the lack of locking is intentional for
file_ra_state fields and also there has been pushback against requiring
READ_ONCE and WRITE_ONCE for accesses, for example in the following recent
thread:
https://lore.kernel.org/r/2xzc3lp6ehtjwbzip4i5muh4g6oep4l72zh3j6sablfghbvbau@kh7famgorzrh/
There is another one of these just below (copied here) that can result in an
underflow if ra->size gets smaller after the condition is checked. Looks to be
benign right now because do_page_cache_ra puts a limit on the param.
if (ra->size > index - start)
do_page_cache_ra(ractl, ra->size - (index - start),
ra->async_size);
I think a fully correct solution requires READ_ONCE, but, in the meantime, I
can make it less wrong by reading ra->size into a variable and reusing that
variable through the function (+ re-reading before the fallback branch).