On Fri, Mar 7, 2025 at 1:12 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
On Thu, Mar 06, 2025 at 01:49:48PM -0800, Yang Shi wrote:Ok, so the real question (to Yang) is: who are these /dev/zero users
LKP reported 800% performance improvement for small-allocs benchmarkYeah sorry about that, but unfortunately something we really do have to
from vm-scalability [1] with patch ("/dev/zero: make private mapping
full anonymous mapping") [2], but the patch was nack'ed since it changes
the output of smaps somewhat.
think about (among other things, the VMA edge cases are always the source
of weirdness...)
The profiling shows one of the major sources of the performanceGreat work tracking that down! Sorry I lost track of the other thread.
improvement is the less contention to i_mmap_rwsem.
The small-allocs benchmark creates a lot of 40K size memory maps byUghhhh god haha.
mmap'ing private /dev/zero then triggers page fault on the mappings.
When creating private mapping for /dev/zero, the anonymous VMA is
created, but it has valid vm_file. Kernel basically assumes anonymous
VMAs should have NULL vm_file, for example, mmap inserts VMA to the file
rmap tree if vm_file is not NULL. So the private /dev/zero mapping
will be inserted to the file rmap tree, this resulted in the contention
to i_mmap_rwsem. But it is actually anonymous VMA, so it is pointless
to insert it to file rmap tree.
Skip anonymous VMA for this case. Over 400% performance improvement wasThat's insane. Amazing work.
reported [3].
that require an insane degree of scalability, and why didn't they
switch to regular MAP_ANONYMOUS? Are they in the room with us?
... And this is probably a non-issue in 99% of !/dev/zero mmaps unlessIt is not on par with the 800% improvement from the original patch. It isInteresting, I guess you'll take a look at this also?
because page fault handler needs to access some members of struct file
if vm_file is not NULL, for example, f_mode and f_mapping. They are in
the same cacheline with file refcount. When mmap'ing a file the file
refcount is inc'ed and dec'ed, this caused bad cache false sharing
problem. The further debug showed checking whether the VMA is anonymous
or not can alleviate the problem. But I'm not sure whether it is the
best way to handle it, maybe we should consider shuffle the layout of
struct file.
it's something like libc.so.6 at an insane rate of execs/second.
This seems like a patch in search of a problem and I really don't see
why we should wart up the mmap code otherwise. Not that I have a huge
problem with this patch, which is somewhat simple and obvious.
It'd be great if there was a real workload driving this rather than
useless synthetic benchmarks.