Re: GUP guarantees wrt to userspace mappings redesign
From: Kirill A. Shutemov
Date: Mon May 02 2016 - 14:03:47 EST
On Mon, May 02, 2016 at 06:22:11PM +0200, Oleg Nesterov wrote:
> On 05/02, Kirill A. Shutemov wrote:
> >
> > On Mon, May 02, 2016 at 04:15:38PM +0200, Oleg Nesterov wrote:
> > > >
> > > > - I don't see any check page_count() around __replace_page() in uprobes,
> > > > so it can easily replace pinned page.
> > >
> > > Why it should? even if it races with get_user_pages_fast()... this doesn't
> > > differ from the case when an application writes to MAP_PRIVATE non-anonymous
> > > region, no?
> >
> > < I know nothing about uprobes or ptrace in general >
> >
> > I think the difference is that the write is initiated by the process
> > itself, but IIUC __replace_page() can be initiated by other process, so
> > it's out of control of the application.
>
> Yes. Just like gdb can insert a breakpoint into the read-only executable vma.
>
> > So we have pages pinned by a driver and the driver expects the pinned
> > pages to be mapped into userspace, then __replace_page() kicks in and put
> > different page there -- driver's expectation is broken.
>
> Yes... but I don't understand the problem space. I mean, I do not know why
> this driver should expect this, how it can be broken, etc.
>
> I do not even understand why "initiated by other process" can make any
> difference... Unless this driver somehow controls all threads which could
> have this page mapped.
Okay, my understanding is following:
Some drivers (i.e. vfio) rely on get_user_page{,_fast}() to pin the memory
and expect pinned pages to be mapped into userspace until the pin is gone.
This memory is used to communicate between kernel and userspace.
This kinda works unless an application does something that can change page
tables in the area: fork() (due CoW), mremap(), munmap(), MADV_DONTNEED,
etc.
This model is really fragile, but the application has (kinda) control over
the situation: if it's very careful, it can keep the mapping intact.
With __replace_page() situation is different: it can be triggered from
outside of process and therefore out of control of the application.
I don't think there's something to fix on uprobe side. It's part of
debugging interface. Debuggers can be destructive, nothing new there.
I just tried to find the cases why this usage of GUP is not correct...
--
Kirill A. Shutemov