Re: [PATCH -next] uaccess: fix __access_ok limit setup in compat mode

From: Arnd Bergmann
Date: Tue Mar 22 2022 - 10:42:53 EST


On Tue, Mar 22, 2022 at 1:55 PM chenjiahao (C) <chenjiahao16@xxxxxxxxxx> wrote:
> 在 2022/3/18 15:44, Arnd Bergmann 写道:
> >
> > This should not result in any user visible difference, in both cases
> > user process will see a -EFAULT return code from its system call.
> > Are you able to come up with a test case that shows an observable
> > difference in behavior?
>
> Actually, this patch do comes from a testcase failure, the code is
> pasted below:

Thank you for the test case!

> #define TMPFILE "__1234567890"
> #define BUF_SIZE 1024
>
> int main()
> {
> char buf[BUF_SIZE] = {0};
> int fd;
> int ret;
> int err;
>
> fd = open(TMPFILE, O_CREAT | O_RDWR);
> if(-1 == fd)
> {
> perror("open");
> return 1;
> }
>
> ret = pread64(fd, buf, -1, 1);
> if((-1 == ret) && (EFAULT == errno))
> {
> close(fd);
> unlink(TMPFILE);
> printf("PASS\n");
> return 0;
> }
> err = errno;
> perror("pread64");
> printf("err = %d\n", err);
> close(fd);
> unlink(TMPFILE);
> printf("FAIL\n");
>
> return 1;
> }
>
> The expected result is:
>
> PASS
>
> but the result of 32-bit testcase running in x86-64 kernel with compat
> mode is:
>
> pread64: Success
> err = 0
> FAIL
>
>
> In my explanation, pread64 is called with count '0xffffffffull' and
> offset '1', which might still not trigger
>
> page fault in 64-bit kernel.
>
>
> This patch uses TASK_SIZE as the addr_limit to performance a stricter
> address check and intercepts

I see. So while the kernel behavior was not meant to change from
my patch, it clearly did, which may cause problems. However, I'm
not sure if the changed behavior is actually wrong.

> the illegal pointer address from 32-bit userspace at a very early time.
> Which is roughly the same
>
> address limit check as __access_ok in arch/ia64.
>
>
> This is why this fixes my testcase failure above, or have I missed
> anything else?

My interpretation of what is going on here is that the pread64() call
still behaves according to the documented behavior, returning a small
number of bytes from the file, up to the first faulting address.

As the manual page for pread64() describes,

On success, pread() returns the number of bytes read
(a return of zero indicates end of file) and pwrite() returns
the number of bytes written.
Note that it is not an error for a successful call to transfer
fewer bytes than requested (see read(2)
and write(2)).

The difference after my patch is that originally it returned
-EFAULT because part of the buffer is outside of the
addressable memory, but now it returns success because
part of the buffer is inside the addressable memory ;-)

I'm also not sure about which patch caused the change in
behavior, can you double-check that? The one you cited,
967747bbc084 ("uaccess: remove CONFIG_SET_FS"), does
not actually touch the x86 implementation, and commit
36903abedfe8 ("x86: remove __range_not_ok()") does touch
x86 but the limit was already TASK_SIZE_MAX since commit
47058bb54b57 ("x86: remove address space overrides using
set_fs()") back in linux-5.10.

Arnd