[PATCH AUTOSEL 7.0-6.1] orangefs_readahead: don't overflow the bufmap slot.
From: Sasha Levin
Date: Mon Apr 20 2026 - 09:55:12 EST
From: Mike Marshall <hubcap@xxxxxxxxxxxx>
[ Upstream commit 415e507cdefc510c01de8ab6644163327ee9a5d0 ]
generic/340 showed that this caller of wait_for_direct_io was
sometimes asking for more than a bufmap slot could hold. This splits
the calls up if needed.
Signed-off-by: Mike Marshall <hubcap@xxxxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
LLM Generated explanations, may be completely bogus:
I now have enough information to provide a comprehensive analysis.
## Analysis
### Phase 1: Commit Message Forensics
**Step 1.1 — Subject parsing:**
Record: subsystem = `fs/orangefs`, function = `orangefs_readahead`;
action = "don't overflow" (prevention/fix verb); claim = bufmap slot
overflow is avoided by splitting up large wait_for_direct_io calls.
**Step 1.2 — Tags:**
Record: Only `Signed-off-by: Mike Marshall <hubcap@xxxxxxxxxxxx>`
(author is the OrangeFS subsystem maintainer per MAINTAINERS). No
Fixes:, Cc: stable:, Reviewed-by:, Tested-by:, Reported-by:, or Link:
tags. The absence of these tags is expected for autoselection review.
**Step 1.3 — Body analysis:**
Record: The bug manifests via xfstests `generic/340` (a known mmap
writing race test using the `holetest` helper — verified by fetching the
test source). The author states the caller was "sometimes asking for
more than a bufmap slot could hold," i.e. the `total_size` passed to
`wait_for_direct_io` could exceed the bufmap descriptor's slot size.
**Step 1.4 — Hidden bug fix detection:**
Record: The verb "don't overflow" and the claim of "sometimes asking for
more than a bufmap slot could hold" is explicit: this is an out-of-
bounds / overflow fix, not a cleanup.
### Phase 2: Diff Analysis
**Step 2.1 — Inventory:**
Record: Single file (`fs/orangefs/inode.c`), single function
(`orangefs_readahead`). +27/−9, ~18 net added lines. Single-file
surgical fix.
**Step 2.2 — Code flow change:**
Record:
- BEFORE: One call `wait_for_direct_io(..., readahead_length(rac), ...)`
using the full readahead length which can exceed 4 MB.
- AFTER: A loop that chunks the request into pieces of at most 4194304
bytes (4 MB), advancing `offset` and draining `remaining`.
**Step 2.3 — Bug mechanism:**
Record: Category (f/h) bounds/size fix. I inspected
`fs/orangefs/orangefs-bufmap.c` and `orangefs_bufmap_copy_to_iovec`:
```497:521:fs/orangefs/orangefs-bufmap.c
int orangefs_bufmap_copy_to_iovec(struct iov_iter *iter,
int buffer_index,
size_t size)
{
struct orangefs_bufmap_desc *from;
int i;
from = &__orangefs_bufmap->desc_array[buffer_index];
...
for (i = 0; size; i++) {
struct page *page = from->page_array[i];
size_t n = size;
if (n > PAGE_SIZE)
n = PAGE_SIZE;
n = copy_page_to_iter(page, 0, n, iter);
```
Each `desc.page_array` is a pointer into a shared larger
`bufmap->page_array` sliced by `pages_per_desc` (`desc_size /
PAGE_SIZE`, i.e. 1024 pages for a 4 MB slot). If `size` exceeds one
slot, the loop index `i` walks off the end of the slot into the next
slot's pages — or, for the last slot, off the end of
`bufmap->page_array` entirely. This produces either data corruption
(mixing data destined for a different concurrent I/O) or a wild out-of-
bounds dereference of an uninitialized page pointer.
**Step 2.4 — Fix quality:**
Record: The fix is obviously correct in shape: it bounds each call to ≤
4 MB and correctly advances both `offset` and the loop counter. `iter`
is naturally advanced inside `wait_for_direct_io` via
`orangefs_bufmap_copy_to_iovec`. There is an existing precedent for the
same pattern in the same file in `orangefs_direct_IO` (lines 519–565),
which uses `orangefs_bufmap_size_query()` to cap the per-call size —
this is literally the same idea. Minor concern: the fix hardcodes
`4194304` rather than using `orangefs_bufmap_size_query()`, but the same
function already hardcodes `4194304` for `readahead_expand` above, so
the fix is internally consistent with surrounding code. Regression risk
is very low; worst-case is a marginal performance change (two 4 MB
round-trips instead of one >4 MB round-trip).
### Phase 3: Git History Investigation
**Step 3.1 — Blame:**
Record: `git blame` shows the buggy `readahead_length(rac)` call site
has been unchanged since commit `0c4b7cadd1ade1` "Orangef: implement
orangefs_readahead." authored by Mike Marshall on 2021-03-28 (kernel
v5.13 merge window). The only cosmetic change was `iov_iter_xarray`
signature churn in v6.1 by Al Viro (`de4eda9de2d957`).
**Step 3.2 — Fixes tag:**
Record: No Fixes: tag in the commit, but blame reveals the introducing
commit is `0c4b7cadd1ade1` (present in v5.13+).
**Step 3.3 — File history:**
Record: Recent touches to the readahead code are `cd01049d9ca37` (folio
conversion, 2023) and `121a83ce6fe69` ("orangefs: Bufmap deadcoding").
None of them alter the bounds behavior; all stable trees retain the
unsafe pattern.
**Step 3.4 — Author:**
Record: Mike Marshall (`hubcap@xxxxxxxxxxxx`) is the OrangeFS
maintainer. The commit arrived through the `for-linus-7.1-ofs1` pull
request (verified by fetching the PR from lore.kernel.org archive search
results). Author credibility is high.
**Step 3.5 — Dependencies:**
Record: Fix is standalone. It does not touch any function signatures,
nor depend on the companion series commit `e61bc5e4d8743` (bufmap-as-
folios) that was pulled alongside.
### Phase 4: Mailing List Research
**Step 4.1 — b4 dig:**
Record: `b4 dig -c 415e507cdefc5...` searched lore by patch-id, subject,
and author — all three attempts returned no match. This is consistent
with OrangeFS patches that are often applied directly from the
maintainer tree without appearing on a public list as a standalone patch
thread. The PR containing the fix (`for-linus-7.1-ofs1`, 2026-04-17) is
visible in the lore archive but does not contain a per-patch discussion
thread.
**Step 4.2/4.3 — Discussion/bug report:**
Record: No separate review thread was found; the change flowed through
Mike Marshall's maintainer tree to Linus in the 7.1-rc window.
`generic/340` is the reproducer cited by the author and is a documented
mmap write race test in xfstests (verified by fetching its source).
**Step 4.4/4.5 — Related patches/stable discussion:**
Record: No related stable mailing list discussion found. The sibling
commit `e61bc5e4d8743` ("bufmap: manage as folios, V2") confirms a slot
size of 4 MB in ten-slot configurations — establishing the `4194304`
constant matches the real slot size.
### Phase 5: Code Semantic Analysis
**Step 5.1/5.2 — Functions/callers:**
Record: Only `orangefs_readahead` is modified. It is registered as
`.readahead` in `orangefs_address_operations` (verified via grep in each
stable branch). It is invoked by the VFS readahead machinery
(`page_cache_ra_*`), reachable from any buffered read of an OrangeFS
file, including `read(2)` and `mmap(2)` page faults — i.e. the normal
user-facing path.
**Step 5.3 — Callees:**
Record: Calls `wait_for_direct_io` (fs/orangefs/file.c), which allocates
a bufmap slot via `orangefs_bufmap_get()`, then uses
`orangefs_bufmap_copy_to_iovec` to fetch up to `total_size` bytes — the
OOB occurs here.
**Step 5.4 — Reachability:**
Record: Path is reachable from userspace with a normal read/mmap of any
file on OrangeFS. `generic/340` triggers it via `holetest`. Reproducer
exists.
**Step 5.5 — Similar patterns:**
Record: `orangefs_direct_IO` already chunks I/O using
`orangefs_bufmap_size_query()` (lines 519–565). The readahead path was
simply missing this safety loop; the fix adds the analogous defense.
### Phase 6: Stable Tree Analysis
**Step 6.1 — Buggy code in stable:**
Record: Checked `stable-push/linux-5.15.y`, `6.1.y`, `6.6.y`, `6.12.y`,
`6.17.y`, `6.18.y`, `6.19.y`. All have the identical unbounded single-
call pattern. The bug exists in every active LTS and rolling-stable
branch.
**Step 6.2 — Backport difficulty:**
Record: The modified hunk itself is identical across all stable trees —
only the surrounding "clean up" block differs
(`readahead_page`/`page_endio` in 5.15/6.1 vs.
`readahead_folio`/`folio_*` from 6.6 onward). The fix inserts its loop
before that block and does not touch it, so application should be clean
or essentially clean on every LTS.
**Step 6.3 — Prior fixes:**
Record: No earlier or alternative fix has been applied to stable for
this issue.
### Phase 7: Subsystem Context
**Step 7.1/7.2:**
Record: Subsystem is `fs/orangefs` — a distributed filesystem.
Criticality: PERIPHERAL (limited user base compared to ext4/xfs) but
still a real filesystem with real data-integrity expectations. Activity
is low-to-moderate; mostly maintenance.
### Phase 8: Impact and Risk
**Step 8.1 — Affected population:**
Record: Filesystem-specific — only OrangeFS users. But for those users,
the bug fires on ordinary reads, not a rare configuration.
**Step 8.2 — Trigger:**
Record: Fires any time the readahead window exceeds 4 MB, which the
existing `readahead_expand(... 4194304)` plus kernel heuristics make
fairly likely for sequential reads of large files. An unprivileged user
reading a >4 MB file can trigger it.
**Step 8.3 — Failure severity:**
Record: Out-of-bounds `page_array[i]` dereference → at best reads pages
belonging to another concurrent OrangeFS I/O (silent data corruption /
info leak between slots); at worst walks off the end of
`bufmap->page_array` (NULL/garbage page pointer, crash in
`copy_page_to_iter`). Classification: HIGH.
**Step 8.4 — Risk/benefit:**
Record:
- Benefit: Prevents data corruption and possible OOB read on a user-
triggerable path in OrangeFS.
- Risk: Very low. ~18 net lines in a single function. Identical chunking
pattern exists elsewhere in the file. No API or locking changes. The
only downside is slightly more round-trips for >4 MB readaheads.
- Ratio: Favorable.
### Phase 9: Synthesis
**Step 9.1 — Evidence:**
FOR: Real OOB/corruption fix reachable from userspace (reads, mmap);
reproduced by xfstests generic/340; surgical scope; author is subsystem
maintainer; identical pattern already used elsewhere in the same file;
bug present in every stable tree since v5.13.
AGAINST: No explicit Cc: stable or Fixes: tag (expected for
autoselection); fix uses a hardcoded constant instead of
`orangefs_bufmap_size_query()` (cosmetic concern, consistent with
neighbouring code).
**Step 9.2 — Stable rules:**
1. Obviously correct? Yes — clean chunking, matches existing precedent.
2. Real bug? Yes — OOB in bufmap page array.
3. Important? Yes — data corruption / OOB read, HIGH severity.
4. Small and contained? Yes — ~18 net lines, single function, single
file.
5. No new features/APIs? Correct.
6. Applies to stable? Yes — identical hunk context in all LTS branches.
**Step 9.3 — Exception categories:** Not applicable; this is a
straightforward bug fix.
**Step 9.4 — Decision:** YES.
## Verification
- [Phase 1] Parsed tags: only `Signed-off-by: Mike Marshall` present; no
Fixes:/Cc:stable/Link:/Reported-by: (confirmed via `git log -1
--format='%B'`).
- [Phase 1] Verified `generic/340` is xfstests mmap-write race via
`holetest` by fetching
`github.com/kdave/xfstests/raw/master/tests/generic/340`.
- [Phase 2] Inspected `orangefs_bufmap_copy_to_iovec` in
`fs/orangefs/orangefs-bufmap.c`: confirmed unchecked
`from->page_array[i]` indexing driven by `size`.
- [Phase 2] Inspected `orangefs_bufmap_map`: confirmed
`desc_array[i].page_array` are slices of the shared
`bufmap->page_array` (line 279–285), so an over-sized `size` walks
into the next slot or off the end.
- [Phase 2] Inspected `orangefs_direct_IO` (lines 519–565): confirmed
existing per-call size cap using `orangefs_bufmap_size_query()` — the
readahead fix mirrors this pattern.
- [Phase 3] `git blame` on lines 242–247 in `fs/orangefs/inode.c`: buggy
pattern originates in `0c4b7cadd1ade1` (2021-03-28, v5.13 merge
window).
- [Phase 3] `git log --oneline -- fs/orangefs/inode.c`: confirmed only
cosmetic changes (`de4eda9de2d957`, `cd01049d9ca37`) since
introduction.
- [Phase 3] `git log --author="Mike Marshall" --oneline`: confirmed
author is the long-time OrangeFS maintainer.
- [Phase 4] `b4 dig -c 415e507cdefc5...`: no lore match (by patch-id, by
author/subject, or by in-body From). Falls back to pull-request
channel.
- [Phase 4] `WebFetch https://yhbt.net/lore/linux-fsdevel/?q=for-
linus-7.1-ofs1`: located the pull request for 7.1-ofs1 by Mike
Marshall (2026-04-17) and pr-tracker-bot confirmation.
- [Phase 4] Searched lore for "orangefs_readahead bufmap slot" and
related phrases: no standalone review thread found — patch flowed
through the maintainer tree.
- [Phase 5] `Grep` for `.readahead = orangefs_readahead` in all stable
branches: confirmed it is wired into `orangefs_address_operations` and
thus reachable from ordinary VFS reads.
- [Phase 5] Inspected `wait_for_direct_io` (`fs/orangefs/file.c` lines
49–280): confirmed `total_size` is passed unchecked as
`new_op->upcall.req.io.count` and is used to drive
`orangefs_bufmap_copy_to_iovec`'s loop.
- [Phase 6] `git show stable-
push/linux-{5.15,6.1,6.6,6.12,6.17,6.18,6.19}.y:fs/orangefs/inode.c`:
confirmed identical unbounded `wait_for_direct_io(...,
readahead_length(rac), ...)` call in every branch.
- [Phase 6] Compared context lines: modified hunk is identical; only the
post-loop cleanup differs (page vs folio) and is not touched by the
fix — clean apply expected.
- [Phase 8] Verified bufmap slot size: sibling commit `e61bc5e4d8743`
("bufmap: manage as folios, V2.") explicitly documents ten 4 MB slots;
matches the `4194304` constant.
- UNVERIFIED: Exact mainline release that will first contain this commit
(it is in the 7.1-ofs1 PR, so 7.1-rc onward); not yet merged into the
tree HEAD I have available — irrelevant to the YES decision.
- UNVERIFIED: Whether `generic/340` was observed to reproduce on every
stable branch (not directly tested, but the code path is identical in
every branch so the same failure applies).
The fix addresses a real, userspace-triggerable out-of-bounds memory
access in a filesystem read path, is minimal, self-contained, authored
by the subsystem maintainer, validated by an xfstests reproducer, and
applies cleanly to all active stable trees where the bug has been
present since v5.13.
**YES**
fs/orangefs/inode.c | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 2d4710d0e05e1..af7c9432e141b 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -224,6 +224,8 @@ static void orangefs_readahead(struct readahead_control *rac)
loff_t new_start = readahead_pos(rac);
int ret;
size_t new_len = 0;
+ size_t this_size;
+ size_t remaining;
loff_t bytes_remaining = inode->i_size - readahead_pos(rac);
loff_t pages_remaining = bytes_remaining / PAGE_SIZE;
@@ -239,17 +241,33 @@ static void orangefs_readahead(struct readahead_control *rac)
offset = readahead_pos(rac);
i_pages = &rac->mapping->i_pages;
- iov_iter_xarray(&iter, ITER_DEST, i_pages, offset, readahead_length(rac));
+ iov_iter_xarray(&iter, ITER_DEST, i_pages,
+ offset, readahead_length(rac));
- /* read in the pages. */
- if ((ret = wait_for_direct_io(ORANGEFS_IO_READ, inode,
- &offset, &iter, readahead_length(rac),
- inode->i_size, NULL, NULL, rac->file)) < 0)
- gossip_debug(GOSSIP_FILE_DEBUG,
- "%s: wait_for_direct_io failed. \n", __func__);
- else
- ret = 0;
+ remaining = readahead_length(rac);
+ while (remaining) {
+ if (remaining > 4194304)
+ this_size = 4194304;
+ else
+ this_size = remaining;
+
+ /* read in the pages. */
+ if ((ret = wait_for_direct_io(ORANGEFS_IO_READ, inode,
+ &offset, &iter, this_size,
+ inode->i_size, NULL, NULL, rac->file)) < 0) {
+ gossip_debug(GOSSIP_FILE_DEBUG,
+ "%s: wait_for_direct_io failed. :%d: \n",
+ __func__, ret);
+ goto cleanup;
+ } else {
+ ret = 0;
+ }
+
+ remaining -= this_size;
+ offset += this_size;
+ }
+cleanup:
/* clean up. */
while ((folio = readahead_folio(rac))) {
if (!ret)
--
2.53.0