Re: [PATCH] /dev/zero: make private mapping full anonymous mapping

From: Yang Shi
Date: Tue Jan 14 2025 - 11:53:24 EST





On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
+ Willy for the fs/weirdness elements of this.

On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
When creating private mapping for /dev/zero, the driver makes it an
anonymous mapping by calling set_vma_anonymous(). But it just sets
vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
Hm yikes.

This is a special case and the VMA doesn't look like either anonymous VMA
or file VMA. It confused other kernel subsystem, for example, khugepaged [1].

It seems pointless to keep such special case. Making private /dev/zero
mapping a full anonymous mapping doesn't change the semantic of
/dev/zero either.
My concern is that ostensibly there _is_ a file right? Are we certain that by
not setting this we are not breaking something somewhere else?

Are we not creating a sort of other type of 'non-such-beast' here?

But the file is /dev/zero. I don't see this could break the semantic of /dev/zero. The shared mapping of /dev/zero is not affected by this change, kernel already treated private mapping of /dev/zero as anonymous mapping, but with some weird settings in VMA. When reading the mapping, it returns 0 with zero page, when writing the mapping, a new anonymous folio is allocated.


I mean already setting it anon and setting vm_file non-NULL is really strange.

The user visible effect is the mapping entry shown in /proc/<PID>/smaps
and /proc/<PID>/maps.

Before the change:
ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8 /dev/zero

After the change:
ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0

Yeah this seems like it might break somebody to be honest, it's really
really really strange to map a file then for it not to be mapped.

Yes, it is possible if someone really care whether the anonymous-like mapping is mapped by /dev/zero or just created by malloc(). But I don't know who really do...


But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
file but for it to be marked anonymous.

God what a mess.

[1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@xxxxxxxxxx/
I kind of hate that we have to mitigate like this for a case that should
never ever happen so I'm inclined towards your solution but a lot more
inclined towards us totally rethinking this.

Do we _have_ to make this anonymous?? Why can't we just reference the zero
page as if it were in the page cache (Willy - feel free to correct naive
misapprehension here).

TBH, I don't see why page cache has to be involved. When reading, 0 is returned by zero page. When writing a CoW is triggered if page cache is involved, but the content of the page cache should be just 0, so we copy 0 to the new folio then write to it. It doesn't make too much sense. I think this is why private /dev/zero mapping is treated as anonymous mapping in the first place.


Signed-off-by: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx>
---
drivers/char/mem.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 169eed162a7f..dae113f7fc1b 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
if (vma->vm_flags & VM_SHARED)
return shmem_zero_setup(vma);
vma_set_anonymous(vma);
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
+ vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
Hmm, this might have been mremap()'d _potentially_ though? And then now
this will be wrong? But then we'd have no way of tracking it correctly...

I'm not quite familiar with the subtle details and corner cases of meremap(). But mmap_zero() should be called by mmap(), so the VMA has not been visible to user yet at this point IIUC. How come mremap() could move it?


I've not checked the function but do we mark this as a special mapping of
some kind?

+
return 0;
}

--
2.47.0