process_vm_readv/writev: partial transfer of a single iov?

From: Josh Gao
Date: Wed Apr 19 2017 - 22:57:52 EST


The manpage for process_vm_readv/writev says that "Partial transfers apply at
the granularity of iovec elements. These system calls won't perform a partial
transfer that splits a single iovec element."

However, it seems like this isn't actually true in current kernels (tested on
ubuntu 4.4.0-66, but current TOT looks like it should be the same).

The following code will return 1 on new kernels, and -1 with errno set to
EFAULT on "old" ones (anything newer than 3.15-rc1?):

char *p = (char*)mmap(0, PAGE_SIZE * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(p + PAGE_SIZE, PAGE_SIZE, PROT_NONE);
char x;
struct iovec dst, src;
dst.iov_base = &x;
dst.iov_len = 1;
src.iov_base = p + PAGE_SIZE - 1;
src.iov_len = PAGE_SIZE;
return process_vm_readv(getpid(), &dst, 1, &src, 1, 0);

It looks like this change in behavior was triggered by the commit 240f390
(process_vm_access: switch to copy_page_to_iter/iov_iter_copy_from_user), which
increments the iov_iter after each successful page copy, not at the end.

IMO, the current behavior seems more sensible than the one that's documented.
(I ran into this by not reading the manpage carefully and assuming the current
behavior, and then running code on an old kernel.) It's been in the kernel for
a pretty long time with no one noticing AFAICT. Should this just be documented
with a mention of the previous behavior in BUGS?

Thanks,
Josh