Re: [PATCH] sh: Implement __get_user_u64() required for 64-bit get_user()
From: Michael Karcher
Date: Mon Jun 01 2020 - 16:26:15 EST
Rich Felker schrieb:
>> >> Can I propose a different solution? For archs where there isn't
>> >> actually any 64-bit load or store instruction, does it make sense to
>> >> be writing asm just to do two 32-bit loads/stores, especially when
>> >> this code is not in a hot path?
>> > Yes, that's an option, too.
>> That's the solution that Michael Karcher suggested to me as an
>> alternative when I talked to him off-list.
There is a functional argument agains using get_user_32 twice, which I
overlooked in my private reply to Adrian. If any of the loads fail, we do
not only want err to be set to -EFAULT (which will happen), but we also
want a 64-bit zero as result. If one 32-bit read faults, but the other one
works, we would get -EFAULT together with 32 valid data bits, and 32 zero
I continue to elaborate on my performance remark, ignoring this functional
difference (which is a good reason to not just use two 32-bit accesses,
much more so than the performance "issue").
> I don't have an objection to doing it the way you've proposed, but I
> don't think there's any performance distinction or issue with the two
Assuming we don't need two exception table entries (put_user_64 currently
uses only one, maybe it's wrong), using put_user_32 twice creates an extra
unneeded exception table entry, which will "bloat" the exception table.
That table is most likely accessed by a binary search algorithm, so the
performance loss is marginal, though. Also a bigger table size is
cache-unfriendly. (Again, this is likely marginal again, as binary search
is already extremely cache-unfriendly).
A similar argument can be made for the exception handler. Even if we need
two entries in the exception table, so the first paragraph does not apply,
the two entries in the exception table can share the same exception
handler (clear the whole 64-bit destination to zero, set -EFAULT, jump
past both load instructions), so that part of (admittedly cold) kernel
code can get some instructios shorter.