Re: [git pull] uaccess-related bits of vfs.git
From: Linus Torvalds
Date: Fri May 12 2017 - 21:01:03 EST
So I should have asked earlier, but I was feeling rather busy during
the early merge window..
On Sun, Apr 30, 2017 at 8:45 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> uaccess unification pile. It's _not_ the end of uaccess work, but
> the next batch of that will go into the next cycle. This one mostly takes
> copy_from_user() and friends out of arch/* and gets the zero-padding behaviour
> in sync for all architectures.
> Dealing with the nocache/writethrough mess is for the next cycle;
> fortunately, that's x86-only.
So I'm actually more interested to hear if you have any pending work
on cleaning up the __get/put_user() mess we have. Is that on your
radar at all?
In particular, right now, both __get_user() and __put_user() are nasty
and broken interfaces.
It *used* to be that they were designed to just generate a single
instruction. That's not what they currently do at all, due to the
whole SMAP/PAN on x86 and arm.
For example, on x86, right now a single __put_user() call ends up
generating something like
#APP
# 58 "./arch/x86/include/asm/smap.h" 1
661:
662:
.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90
663:
.pushsection .altinstructions,"a"
.long 661b - .
.long 6641f - .
.word ( 9*32+20)
.byte 663b-661b
.byte 6651f-6641f
.byte 663b-662b
.popsection
.pushsection .altinstr_replacement, "ax"
6641:
.byte 0x0f,0x01,0xcb
6651:
.popsection
# 0 "" 2
#NO_APP
xorl %eax, %eax # __pu_err
#APP
# 269 "fs/readdir.c" 1
1: movq %rcx,8(%rdi) # offset, MEM[(struct __large_struct *)_16]
2:
.section .fixup,"ax"
3: mov $-14,%eax #, __pu_err
jmp 2b
.previous
.pushsection "__ex_table","a"
.balign 4
.long (1b) - .
.long (3b) - .
.long (ex_handler_default) - .
.popsection
# 0 "" 2
# 52 "./arch/x86/include/asm/smap.h" 1
661:
662:
.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90
663:
.pushsection .altinstructions,"a"
.long 661b - .
.long 6641f - .
.word ( 9*32+20)
.byte 663b-661b
.byte 6651f-6641f
.byte 663b-662b
.popsection
.pushsection .altinstr_replacement, "ax"
6641:
.byte 0x0f,0x01,0xca
6651:
.popsection
# 0 "" 2
#NO_APP
and while much of that is out-of-line and in different sections, it
basically means that it's insane how we inline these things.
Yes, yes, the inlined part of the above horror-story ends up being just
clac
xor %eax,%eax
mov %rcx,0x8(%rdi)
stac
and everything else is just boiler-plate for the alt-instruction
handling and the exception handling.
But even that small thing is rather debatable from a performance
angle: the actual cost of __put_user() these days is not the function
call, but the clac/stac (on modern x86) and the PAN bit games (on
arm).
So I'm actually inclined to just say "We should make
__get_user/__put_user() just be aliases for the normal
get_user/put_user() model".
The *correct* way to do fast user space accesses when you hoist error
checking outside is to use
if (!access_ok(..))
goto efault;
user_access_begin();
..
... loop over or otherwise do multiple "raw" accessess ...
unsafe_get/put_user(c, ptr, got_fault);
unsafe_get/put_user(c, ptr, got_fault);
...
user_access_end();
return 0;
got_fault:
user_access_end();
efault:
return -EFAULT;
which is more boilerplate, but ends up generating much better code.
And for "unsafe_put_user()" in particular, we actually could teach gcc
to use "asm goto" to really improve code generation. I have patches
for that if you want to look.
I'm attaching an example patch for "filldir()" that does that modern
thing. It almost had the right form as-is anyway, and under some loads
(eg "find") filldir actually shows up in profiles too.\
And just from a code generation standpoint, look at what the attached
patch does:
[torvalds@i7 linux]$ diff -u fs/readdir.s ~/readdir.s | diffstat
readdir.s | 548 ++++++++++++++++++++++----------------------------------------
1 file changed, 201 insertions(+), 347 deletions(-)
just from getting rid of that crazy repeated SMAP overhead.
Yeah, yeah, doing diffstat on generated assembly files is all kinds of
crazy, but it's an example of what kind of odd garbage we currently
generate.
But the *first* thing I'd like to do would be to just get rid of
__get_user/__put_user as a thing. It really does generate nasty code,
and we might as well just make it do that function call.
Because what we do now isn't right. If we care about performance, the
"__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And
if you don't care about performance, you should use the regular
xyz_user() functions that do an ok job by just calling the right-sized
helper function instead of inlining the crud.
Hmm?
Linus
fs/readdir.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/readdir.c b/fs/readdir.c
index 9d0212c374d6..03324f54c0e9 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -184,25 +184,27 @@ static int filldir(struct dir_context *ctx, const char *name, int namlen,
if (dirent) {
if (signal_pending(current))
return -EINTR;
- if (__put_user(offset, &dirent->d_off))
- goto efault;
}
+
+ user_access_begin();
+ if (dirent)
+ unsafe_put_user(offset, &dirent->d_off, efault_end);
dirent = buf->current_dir;
- if (__put_user(d_ino, &dirent->d_ino))
- goto efault;
- if (__put_user(reclen, &dirent->d_reclen))
- goto efault;
+ unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
+ unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
+ unsafe_put_user(0, dirent->d_name + namlen, efault_end);
+ unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
+ user_access_end();
+
if (copy_to_user(dirent->d_name, name, namlen))
goto efault;
- if (__put_user(0, dirent->d_name + namlen))
- goto efault;
- if (__put_user(d_type, (char __user *) dirent + reclen - 1))
- goto efault;
buf->previous = dirent;
dirent = (void __user *)dirent + reclen;
buf->current_dir = dirent;
buf->count -= reclen;
return 0;
+efault_end:
+ user_access_end();
efault:
buf->error = -EFAULT;
return -EFAULT;