Re: [RFC][PATCH] Cross Memory Attach

From: Ingo Molnar
Date: Thu Sep 16 2010 - 04:09:17 EST



* KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:

> > On Wed, Sep 15, 2010 at 19:58, Avi Kivity <avi@xxxxxxxxxx> wrote:
> >
> > > Instead of those two syscalls, how about a vmfd(pid_t pid, ulong start,
> > > ulong len) system call which returns an file descriptor that represents a
> > > portion of the process address space.  You can then use preadv() and
> > > pwritev() to copy memory, and io_submit(IO_CMD_PREADV) and
> > > io_submit(IO_CMD_PWRITEV) for asynchronous variants (especially useful with
> > > a dma engine, since that adds latency).
> > >
> > > With some care (and use of mmu_notifiers) you can even mmap() your vmfd and
> > > access remote process memory directly.
> >
> > Rather than introducing a new vmfd() API for this, why not just add
> > implementations for these more efficient operations to the existing
> > /proc/$pid/mem interface?
>
> As far as I heared from my friend, old HP MPI implementation used
> /proc/$pid/mem for this purpose. (I don't know current status).
> However almost implementation doesn't do that because /proc/$pid/mem
> required the process is ptraced. As far as I understand , very old
> /proc/$pid/mem doesn't require it. but It changed for security
> concern. Then, Anybody haven't want to change this interface because
> they worry break security.
>
> But, I don't know what exactly protected "the process is ptraced"
> check. If anyone explain the reason and we can remove it. I'm not
> againt at all.

I did some Git digging - that ptrace check for /proc/$pid/mem read/write
goes all the way back to the beginning of written human history, aka
Linux v2.6.12-rc2.

I researched the fragmented history of the stone ages as well, i checked
out numerous cave paintings, and while much was lost, i was able to
recover this old fragment of a clue in the cave called 'patch-2.3.27',
carbon-dated back as far as the previous millenium (!):

mem_read() in fs/proc/base.c:

+ * 1999, Al Viro. Rewritten. Now it covers the whole per-process part.
+ * Instead of using magical inumbers to determine the kind of object
+ * we allocate and fill in-core inodes upon lookup. They don't even
+ * go into icache. We cache the reference to task_struct upon lookup too.
+ * Eventually it should become a filesystem in its own. We don't use the
+ * rest of procfs anymore.

In such a long timespan language has changed much, so not all of this
scribbling can be interpreted - but one thing appears to be sure: this
is where the MAY_PTRACE() restriction was introduced to /proc/$pid/mem -
as part of a massive rewrite.

Alas, the reason for the restriction was not documented, and is feared
to be lost forever.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/