Re: [RFC] writev() semantics with invalid iovec in the middle

From: Mike Marshall
Date: Thu Sep 15 2016 - 06:23:33 EST


If you squeeze out every byte won't you still have a short
write? And the written data wouldn't be cut at the bad
place, but it would have a weird hole or discontinuity there.

-Mike

On Wed, Sep 14, 2016 at 5:34 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> Right now writev() with 3-iovec array that has unmapped address in
> the second element and total length less than PAGE_SIZE will write the
> first segment and stop at that. Among other things, it guarantees the
> short copy, and I would rather have it yeild 0-bytes write (and -EFAULT as
> return value).
>
> All POSIX has to say about that is this (in 2.3 Error Numbers):
>
> [EFAULT]
> Bad address. The system detected an invalid address in attempting to use
> an argument of a call. The reliable detection of this error cannot be
> guaranteed, and when not detected may result in the generation of a signal,
> indicating an address violation, which is sent to the process.
>
> Note that unmapped page in the middle of a range covered already can lead to
> the same kind of short write - i.e. if we have
> p = mmap(0, 3*4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> munmap(p + 4096, 4096);
> fd = open("/tmp/foo", O_CREAT|O_TRUNC|O_RDWR, 0777);
> write(fd, p + 2048, 8192);
>
> write() will yield -EFAULT, not a 2Kb stored. The same will happen with
> writev(fd, &(struct iovec){p + 2048, 8192}, 1);
> BTW, adding lseek(fd, 2049, SEEK_SET); before that write (or writev) will
> result in 2047 bytes being written by the latter.
>
> IOW, we do not try to squeeze every byte that can be squeezed out of the
> buffer; generally, an unmapped address anywhere in PAGE_SIZE worth of data
> that would go into the same page-aligned chunk of destination can result in
> short write cut at the beginning of that chunk. iovec boundaries act
> as barriers to short writes, mostly by accident.
>
> Do we need to preserve that special treatment of iovec boundaries? I would
> really like to get rid of that - the current behaviour is an easy and reliable
> way to trigger a short copy case in ->write_end() and those are fairly
> brittle. Sure, we still need to cope with them, and I think I've got all
> instances in the current mainline fixed, but they are often suboptimal.
>
> Objections?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html