[PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool

From: Breno Leitao

Date: Fri Jun 26 2026 - 06:27:16 EST

TL;DR: This simplifies the pipe code, unify the page pools, reduce the
code by 11 lines, and improves the microbenchmark by up to 23% — so it's
probably wrong (!?).

Summary:
=======

I've spent some time converging tmp_page[] and the on-stack
anon_pipe_prealloc pool of pages into a single per-pipe pool, as
discussed previously in a few places, most recently at:

https://lore.kernel.org/all/ajLA_zxsYyKISkwp@xxxxxxxxxx/

Problem:
========

1) We have two types of page caches in the pipe mechanism today
* tmp_page[]
* anon_pipe_prealloc

2) they operate in different ways:
* tmp_page[] is protected by the pipe lock
* per-pipe, persistent, 2 pages
* anon_pipe_prealloc is an on-stack pool, not lock protected
* burst, up to 8 pages

Proposal/Design:
================

1) Keep the same page budget as today
a) up to two per-pipe persistent pages
b) burst of up to 8 pages

2) no pages are allocated unless necessary
* Pages are _ONLY_ allocated based on the length of the write,
minus the pages already available in the pool.
* No page is allocated but left unused

3) keep allocation and freeing outside of the lock
* only the assignment of pages stays lock-protected
* Currently, tmp_page[] pages are allocated in the lock, so
this patch will improve it (thus the performance numbers)

How:
====

1) replace tmp_page[] with anon_pipe_prealloc in pipe_inode_info
2) at write (anon_pipe_write), allocate the pages outside the lock in a helper
called anon_pipe_prefill()
a) the assignment into the pool must be lock protected
* anon_pipe_prefill() does it
b) anon_pipe_prefill() can populate up to PIPE_PREALLOC_MAX pages in the
pool
3) once anon_pipe_write is done, the pool is trimmed back to at most
PIPE_PREALLOC_KEEP (2) pages by anon_pipe_trim_pool()

Testing:
========

Tested on a bare-metal Intel(R) Xeon(R) Platinum 8321HC (52 CPUs) using the
pipe_bench selftest (tools/testing/selftests/pipe/pipe_bench).

Two kernels were built from the same configuration (no debug options),
differing only by this series:

- baseline: on-stack anon_pipe_prealloc pool + tmp_page[]
Commit 4e5dfb7c84012 ("Add linux-next specific files for 20260623")
- patched: this series (unified per-pipe pool)

Each kernel was booted on the same host and benchmarked with 5 writers /
5 readers, 64 KiB messages, 5s per run, with and without memory pressure
(stress-ng --vm 4 --vm-bytes 80%). Comparing writes/s and average write
latency:

- no memory pressure: ~+11% throughput, ~-10% avg write latency
- under memory pressure: ~+23% throughput, ~-18% avg write latency

The improvement comes from the larger persistent cache (up to 8 reusable
pages vs the old 2-page tmp_page cache), which reduces alloc_page()/
free_page() traffic; the effect is largest when reclaim is active.

Future:
=======

If this approach is accepted, we could keep all allocated pages in the pool
and rely on a shrinker to trim it under memory pressure.

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
Breno Leitao (4):
fs/pipe: make the prealloc pool per-pipe infrastructure
fs/pipe: add per-pipe pool push, prefill and trim helpers
fs/pipe: switch the write path to the per-pipe pool
fs/pipe: remove the old on-stack prealloc helpers and tmp_page[2]

fs/pipe.c | 162 +++++++++++++++++++---------------------------
include/linux/pipe_fs_i.h | 21 +++++-
2 files changed, 86 insertions(+), 97 deletions(-)
---
base-commit: 4e5dfb7c84012007c3c7061126491bbc92d71bf1
change-id: 20260625-b4-pipe-unification-aba7b8525de7

Best regards,
--
Breno Leitao <leitao@xxxxxxxxxx>