Re: [GIT PULL] kdbus for 4.1-rc1
From: Andy Lutomirski
Date: Mon Apr 27 2015 - 16:11:31 EST
[resent without HTML]
On Apr 27, 2015 5:46 AM, "Michal Hocko" <mhocko@xxxxxxx> wrote:
>
> On Wed 22-04-15 12:36:12, Andy Lutomirski wrote:
> > On Apr 22, 2015 7:57 AM, "Michal Hocko" <mhocko@xxxxxxx> wrote:
> > >
> > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote:
> > > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
> > > > > On Tue 21-04-15 16:01:01, David Herrmann wrote:
> > > > >> Hi
> > > > >>
> > > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@xxxxxxx> wrote:
> > > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
> > > > >> >> Hi
> > > > >> >>
> > > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> > > > >> >> <gnomes@xxxxxxxxxxxxxxxxxxx> wrote:
> > > > >> >> >> On top of that, I think that someone into resource management needs to
> > > > >> >> >> seriously consider whether having a broadcast send do get_user_pages
> > > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> > > > >> >> >> is a good idea.
> > > > >> >> >
> > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
> > > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures
> > > > >> >> > referenced 8)
> > > > >> >>
> > > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is
> > > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> > > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal
> > > > >> >> issue that does not belong here.
> > > > >> >
> > > > >> > It kind of does AFAIU.
> > > > >>
> > > > >> No, it is not. The issue with GUP is that you elevate the page
> > > > >> ref-count and thus prevent lru isolation, sealing, whatsoever.
> > > > >
> > > > > The point was that such a memory might be not present yet and need a
> > > > > page fault with all the side effects - memory reclaim, memcg charge...
> > > > >
> > > > >> I cannot see how it is related to kdbus. However, ...
> > > > >>
> > > > >> > If for nothing else then the memcg reasons mentioned in
> > > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> > > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
> > > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in
> > > > >> > a different user's context then you basically allow to hide memory
> > > > >> > allocations from the memcg. That is a clear show stopper.
> > > > >> >
> > > > >> > Or have I misunderstood the way how shmem buffers are used here?
> > > > >>
> > > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are
> > > > >> used as receive-buffers by kdbus peers. They are read-only to
> > > > >> user-space. All allocations are done by the kernel on message passing.
> > > > >
> > > > > OK, so the shmem buffer is allocated on the kernels behalf and under
> > > > > its control and no userspace can hand over one to kdbus. Do I get
> > > > > it right? If yes then the memcg escape I was describing above is
> > > > > not possible of course. This wasn't clear to me from the previous
> > > > > discussion. Thanks for the clarification!
> > > >
> > > > I'm still missing something here, I think. At the time of pool
> > > > creation, the kernel calls shmem_file_setup in the context of the
> > > > untrusted user. Then, when a privileged daemon broadcasts, the kernel
> > > > calls vfs_iter_write or similar, thus allocating the page, right? I
> > > > don't see why the page would be allocated early or why vfs_iter_write
> > > > and the associated shmem code would care what memcg created the shmem
> > > > file -- all of that code seems to use current's memcg on brief
> > > > inspection.
> > >
> > > Yes it is the current task on the first charge or the original memcg on
> > > the swap in. But my understanding from the above, and I haven't read the
> > > code yet, is that the untrusted userspace is only reader from the buffer
> > > and isn't allowed to modify the buffer.
> > >
> > > > Bear in mind that the bad guy gets to use madvise, etc to mess around
> > > > with the page cache state.
> > >
> > > How can an untrusted user play with shmem when it is read-only?
> > > shmem_file_setup shouls create an unlinked file so no process can access
> > > it via tmpfs AFAIU and potentially fault the memory before the producent
> > > will fill it up (thus fault in in the trusted context). I have no idea
> > > how the receiver gets to the buffer though.
> > >
> >
> > The receiver gets to mmap the buffer. I'm not sure what protection they get.
>
> OK, so I've checked the code. kdbus_pool_new sets up a shmem file
> (unlinked) so not visible externally. The consumer will get it via mmap
> on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and
> clears VM_MAYWRITE. The receiver even doesn't have access to the shmem
> file directly.
>
> It is ugly that kdbus_pool_mmap replaces the original vm_file and make
> it point to the shmem file. I am not sure whether this is safe all the
> time and it would deserve a big fat comment. On the other hand, it seems
> some drivers are doing this already (e.g. dma_buf_mmap).
What happens to map_files in proc? It seems unlikely that CRIU would
ever work on dma_buf, but this could be a problem for CRIU with kdbus.
>
> > The thing I'm worried about is that the receiver might deliberately
> > avoid faulting in a bunch of pages and instead wait for the producer
> > to touch them, causing pages that logically belong to the receiver to
> > be charged to the producer instead.
>
> Hmm, now that I am looking into the code it seems you are right. E.g.
> kdbus_cmd_send runs in the context of the sender AFAIU. This gets down
> to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this
> is where we get to charge the memory. AFAIU the terminology all the
> receivers will share the same shmem file when mmaping the endpoint.
>
> This, however, doesn't seem to be exploitable to hide memory charges
> because the receiver cannot make the buffer writable. A nasty process
> with a small memcg limit could still pre-fault the memory before any
> writer gets sends a message and slow the whole endpoint traffic. But
> that wouldn't be a completely new thing because processes might hammer
> on memory even without memcg... It is just that this would be kind of
> easier with memcg.
> If that is the concern then the buffer should be pre-charged at the time
> when it is created.
The attach I had in mind was that the nasty process with a small memcg
creates one or many of these and doesn't pre-fault it. Then a sender
(systemd?) sends messages and they get charged, possibly once for each
copy sent, to the root memcg. So kdbus should probably pre-charge the
creator of the pool.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/