Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
From: Lorenzo Stoakes
Date: Mon Apr 17 2023 - 15:00:57 EST
On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:
>
> > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > > io_uring open coding this kind of stuff.
> > >
> >
> > How would the semantics of this work? What is broken? It is a little
> > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> > FOLL_ANON_OR_HUGETLB was another consideration...
>
> It says "historically this user has accepted file backed pages and we
> we think there may actually be users doing that, so don't break the
> uABI"
>
> Without the flag GUP would refuse to return file backed pages that can
> trigger kernel crashes or data corruption.
>
> Eg we'd want most places to not specify the flag and the few that do
> to have some justification.
>
> We should consdier removing FOLL_ANON, I'm not sure it really makes
> sense these days for what proc is doing with it. All that proc stuff
> could likely be turned into a kthread_use_mm() and a simple
> copy_to/from user?
>
> I suspect that eliminates the need to check for FOLL_ANON?
>
> Jason
The proc invocations utilising FOLL_ANON are get_mm_proctitle(),
get_mm_cmdline() and environ_read() which each pass it to
access_remote_vm() and which will be being called from a process context,
i.e. with tsk->mm != NULL, but kthread_use_mm() explicitly disallows the
(slightly mind boggling) idea of swapping out an established mm.
So I don't think this route is plausible unless you were thinking of
somehow offloading to a thread?
In any case, if we institute the FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag we
can just drop FOLL_ANON altogether right, as this will be implied and
hugetlb should work here too?
Separately, I find the semantics of access_remote_vm() kind of weird, and
with a possible mmap_lock-free future it does make me wonder whether
something better could be done there.
(Section where I sound like I might be going mad) Perhaps having some means
of context switching into the kernel portion of the remote process as if
were a system call or soft interrupt handler and having that actually do
the uaccess operation could be useful here?
I'm guesing nothing like that exists yet?