[PATCH v3] proc: Use seq_read_iter for /proc/*/maps

From: Arkadiusz Kozdra (Arusekk)
Date: Tue May 04 2021 - 07:56:20 EST


Since seq_read_iter looks mature enough to be used for /proc/<pid>/maps,
re-allow applications to perform zero-copy data forwarding from it.

Some executable-inspecting tools (e.g. pwntools) rely on patching entry
point instructions with minimal machine code that uses sendfile to read
/proc/self/maps to stdout. The sendfile call allows them to do it
without excessive allocations, which would change the mappings, and
therefore distort the information.

This is inspired by the series by Cristoph Hellwig (linked).

Link: https://lore.kernel.org/lkml/20201104082738.1054792-1-hch@xxxxxx/
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Arkadiusz Kozdra (Arusekk) <arek_koz@xxxxx>
---
v3:
- Only commit message changed.
- Clarify what tools use this.
- Do not mention performance.

The average execution time of a patched static ELF outputting to a pipe
(the use case of pwntools inspecting mappings of an executable)
was varying both before and after ca. 3.43ms +-0.05ms (I decided that
the performance impact is not worth mentioning in the commit message).

I think that the change should probably marginally improve speed, but
it will most likely also affect the memory footprint and as such likely
minimally decrease power consumption (I suppose it would only be
measurable when the mappings description grows many pages long).
Speed might be more affected in pathological cases like a close-to-OOM
scenario, but I was unable to test this reliably.
I did my tests under qemu-system-x86_64 on a Ryzen 4500 host without
kvm, with default kernel config.

If someone wants to also test this feature of pwntools for themselves,
it can be used as follows and should print something other than None:

$ pip install pwntools
$ python3
>>> from pwn import *
>>> print(ELF("/bin/true").libc)

Sorry for the delay, but it took me much time to figure out some
low-overhead timing methods.

Does this change need selftests? It looks like it should never break
again if it only uses common code hopefully tested elsewhere.

fs/proc/task_mmu.c | 3 ++-
fs/proc/task_nommu.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e862cab69583..06282294ddb8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -351,7 +351,8 @@ static int pid_maps_open(struct inode *inode, struct file *file)

const struct file_operations proc_pid_maps_operations = {
.open = pid_maps_open,
- .read = seq_read,
+ .read_iter = seq_read_iter,
+ .splice_read = generic_file_splice_read,
.llseek = seq_lseek,
.release = proc_map_release,
};
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index a6d21fc0033c..e55e79fd0175 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -295,7 +295,8 @@ static int pid_maps_open(struct inode *inode, struct file *file)

const struct file_operations proc_pid_maps_operations = {
.open = pid_maps_open,
- .read = seq_read,
+ .read_iter = seq_read_iter,
+ .splice_read = generic_file_splice_read,
.llseek = seq_lseek,
.release = map_release,
};
--
2.31.1