Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7
From: Jan Stancek
Date: Tue Sep 20 2016 - 13:11:49 EST
----- Original Message -----
> From: "Al Viro" <viro@xxxxxxxxxxxxxxxxxx>
> To: "Jan Stancek" <jstancek@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Sent: Tuesday, 20 September, 2016 5:06:57 PM
> Subject: Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7
>
> On Tue, Sep 20, 2016 at 02:56:06PM +0200, Jan Stancek wrote:
> > Hi,
> >
> > I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7.
> > Specifically the EFAULT case, which is passing an iovec with invalid base
> > address:
> > #define CHUNK 64
> > static struct iovec wr_iovec3[] = {
> > {(char *)-1, CHUNK},
> > };
> > hangs with 100% cpu usage and not very helpful stack trace:
> > # cat /proc/28544/stack
> > [<0000000000001000>] 0x1000
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()".
> >
> > Before this commit fault_in_pages_readable() called __get_user() on start
> > address which presumably failed with -EFAULT immediately.
> >
> > With this commit applied fault_in_multipages_readable() appears to return 0
> > for the case when "start" is invalid but "end" overflows. Then according to
> > my traces we keep spinning inside while loop in iomap_write_actor().
>
> Cute. Let me see if I understand what's going on there: we have a wraparound
> that would've been caught by most of access_ok(), but not on an architectures
> where access_ok() is a no-op; in that case the loop is skipped and we
> just check the last address, which passes and we get a false positive.
> Bug is real and it's definitely -stable fodder.
>
> I'm not sure that the fix you propose is right, though. Note that ERR_PTR()
> is not a valid address on any architecture, so any wraparound automatically
> means -EFAULT and we can simply check unlikely(uaddr > end) and bugger off
> if it holds. while() turns into do-while(), of course, and the same is
> needed for the read side.
>
> Could you check if the following works for you?
This fixes pwritev02 hang.
I ran all syscalls tests from LTP, and I see a change in behaviour
of couple other tests (writev01, writev03 and writev04 [1]) in 4.8.0-rc7.
These call writev() with partially invalid iovecs, and now fail with
EFAULT, while with previous -rc6 kernel they returned number of bytes
written before they encountered invalid iovec record.
This should be reproducible also on x86.
Regards,
Jan
[1] https://github.com/linux-test-project/ltp/tree/master/testcases/kernel/syscalls/writev
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 66a1260..7e3d537 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char
> __user *uaddr, int size)
> */
> static inline int fault_in_multipages_writeable(char __user *uaddr, int
> size)
> {
> - int ret = 0;
> char __user *end = uaddr + size - 1;
>
> if (unlikely(size == 0))
> - return ret;
> + return 0;
>
> + if (unlikely(uaddr > end))
> + return -EFAULT;
> /*
> * Writing zeroes into userspace here is OK, because we know that if
> * the zero gets there, we'll be overwriting it.
> */
> - while (uaddr <= end) {
> - ret = __put_user(0, uaddr);
> - if (ret != 0)
> - return ret;
> + do {
> + if (unlikely(__put_user(0, uaddr) != 0))
> + return -EFAULT;
> uaddr += PAGE_SIZE;
> - }
> + } while (uaddr <= end);
>
> /* Check whether the range spilled into the next page. */
> if (((unsigned long)uaddr & PAGE_MASK) ==
> ((unsigned long)end & PAGE_MASK))
> - ret = __put_user(0, end);
> + return __put_user(0, end);
>
> - return ret;
> + return 0;
> }
>
> static inline int fault_in_multipages_readable(const char __user *uaddr,
> int size)
> {
> volatile char c;
> - int ret = 0;
> const char __user *end = uaddr + size - 1;
>
> if (unlikely(size == 0))
> - return ret;
> + return 0;
>
> - while (uaddr <= end) {
> - ret = __get_user(c, uaddr);
> - if (ret != 0)
> - return ret;
> + if (unlikely(uaddr > end))
> + return -EFAULT;
> +
> + do {
> + if (unlikely(__get_user(c, uaddr) != 0))
> + return -EFAULT;
> uaddr += PAGE_SIZE;
> - }
> + } while (uaddr <= end);
>
> /* Check whether the range spilled into the next page. */
> if (((unsigned long)uaddr & PAGE_MASK) ==
> ((unsigned long)end & PAGE_MASK)) {
> - ret = __get_user(c, end);
> - (void)c;
> + return __get_user(c, end);
> }
>
> - return ret;
> + return 0;
> }
>
> int add_to_page_cache_locked(struct page *page, struct address_space
> *mapping,
>