Re: ia32_sysenter_target does not preserve EFLAGS

From: Denys Vlasenko
Date: Fri Mar 27 2015 - 20:35:12 EST


On Fri, Mar 27, 2015 at 9:00 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Mar 27, 2015 at 7:25 AM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>>
>> Apparently, users *don't* depend on arithmetic flags
>> to survive over syscall. They also okay with DF flag
>> being cleared.
>
> Generally, users probably dont' care about many registers at all being
> saved, but it's worth noting that the reason system calls save/restore
> even caller-saved registers is at least partly in order to avoid any
> kernel information leaks.
>
> I don't believe that user mode will ever reasonably care about the
> arithmetic flags being changed, but at the same time I also don't it
> is something we should ever consider a "feature" we should try to take
> advantage of. Generally we should try to not mess with the flag state,
> and I'd *much* rather make the rule be that all the system call return
> paths restore flags as much as possible.

"We don't clobber anything" ABI has its appeal.
OTOH, fulfilling ABI's promises has cost which hast to be paid
on every syscall, regardless whether userspace needed it or not.

Example. This is the uclibc implementation of write():

00000000004acfc4 <__libc_write>:
4acfc4: 53 push %rbx
4acfc5: 48 63 ff movslq %edi,%rdi
4acfc8: b8 01 00 00 00 mov $0x1,%eax
4acfcd: 0f 05 syscall
4acfcf: 48 89 c3 mov %rax,%rbx
4acfd2: 48 81 fb 00 f0 ff ff cmp $0xfffffffffffff000,%rbx
4acfd9: 76 0f jbe 4acfea <__libc_write+0x26>
4acfdb: e8 64 15 00 00 callq 4ae544 <__GI___errno_location>
4acfe0: 89 da mov %ebx,%edx
4acfe2: f7 da neg %edx
4acfe4: 89 10 mov %edx,(%rax)
4acfe6: 48 83 c8 ff or $0xffffffffffffffff,%rax
4acfea: 5b pop %rbx
4acfeb: c3 retq

This is a C function. Therefore any its caller assumes that C-clobbered
registers can be, indeed, clobbered here, so if that caller uses any
of them, it saves/restores them.

All efforts by kernel code to save/restore C-clobbered registers,
eight of them, are in vain. It's just useless work. Userspace
does not benefit from that effort.

If our syscall ABI would say that those regs are not preserved,
we could have a bit faster syscalls. Any userspace code which
really had to have those registers preserved across a particular
syscall, could push/pop them itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/