Re: [GIT PULL] kdbus for 4.1-rc1

From: Michal Hocko
Date: Wed Apr 29 2015 - 13:24:47 EST


On Mon 27-04-15 13:11:03, Andy Lutomirski wrote:
> [resent without HTML]
>
> On Apr 27, 2015 5:46 AM, "Michal Hocko" <mhocko@xxxxxxx> wrote:
> >
> > On Wed 22-04-15 12:36:12, Andy Lutomirski wrote:
[...]
> > > The receiver gets to mmap the buffer. I'm not sure what protection they get.
> >
> > OK, so I've checked the code. kdbus_pool_new sets up a shmem file
> > (unlinked) so not visible externally. The consumer will get it via mmap
> > on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and
> > clears VM_MAYWRITE. The receiver even doesn't have access to the shmem
> > file directly.
> >
> > It is ugly that kdbus_pool_mmap replaces the original vm_file and make
> > it point to the shmem file. I am not sure whether this is safe all the
> > time and it would deserve a big fat comment. On the other hand, it seems
> > some drivers are doing this already (e.g. dma_buf_mmap).
>
> What happens to map_files in proc? It seems unlikely that CRIU would
> ever work on dma_buf, but this could be a problem for CRIU with kdbus.

I am not familiar with CRIU and likewise with map_files directory. I've
actually heard about it for the first time.
[looking...]

So proc_map_files_readdir will iterate all VMAs including the one backed
by the buffer and it will see it's vm_file which will point to the shmem
file AFAICS. So it doesn't seem like CRIU would care because all parties
on the buffer would see the same inode. Whether that is really enough, I
dunno.

> > > The thing I'm worried about is that the receiver might deliberately
> > > avoid faulting in a bunch of pages and instead wait for the producer
> > > to touch them, causing pages that logically belong to the receiver to
> > > be charged to the producer instead.
> >
> > Hmm, now that I am looking into the code it seems you are right. E.g.
> > kdbus_cmd_send runs in the context of the sender AFAIU. This gets down
> > to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this
> > is where we get to charge the memory. AFAIU the terminology all the
> > receivers will share the same shmem file when mmaping the endpoint.
> >
> > This, however, doesn't seem to be exploitable to hide memory charges
> > because the receiver cannot make the buffer writable. A nasty process
> > with a small memcg limit could still pre-fault the memory before any
> > writer gets sends a message and slow the whole endpoint traffic. But
> > that wouldn't be a completely new thing because processes might hammer
> > on memory even without memcg... It is just that this would be kind of
> > easier with memcg.
> > If that is the concern then the buffer should be pre-charged at the time
> > when it is created.
>
> The attach I had in mind was that the nasty process with a small memcg
> creates one or many of these and doesn't pre-fault it. Then a sender
> (systemd?) sends messages and they get charged, possibly once for each
> copy sent, to the root memcg.

Dunno but I suspect that systemd will not talk to random endpoints. Or
can those endpoints be registered on a systembus by an untrusted task?
Bus vs. endpoint relation is still not entirely clear to me.

But even if that was possible I fail to see how the small memcg plays
any role when the task doesn't control the shmem buffer directly but it
only has a read only mapping of it.

> So kdbus should probably pre-charge the creator of the pool.

Yes, as I've said above.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/