Re: [PATCH] sh: Implement __get_user_u64() required for 64-bit get_user()

From: Rich Felker
Date: Mon Jun 01 2020 - 16:50:33 EST


On Mon, Jun 01, 2020 at 10:26:09PM +0200, Michael Karcher wrote:
> Rich Felker schrieb:
> >> >> Can I propose a different solution? For archs where there isn't
> >> >> actually any 64-bit load or store instruction, does it make sense to
> >> >> be writing asm just to do two 32-bit loads/stores, especially when
> >> >> this code is not in a hot path?
> >> > Yes, that's an option, too.
> >> That's the solution that Michael Karcher suggested to me as an
> >> alternative when I talked to him off-list.
>
> There is a functional argument agains using get_user_32 twice, which I
> overlooked in my private reply to Adrian. If any of the loads fail, we do
> not only want err to be set to -EFAULT (which will happen), but we also
> want a 64-bit zero as result. If one 32-bit read faults, but the other one
> works, we would get -EFAULT together with 32 valid data bits, and 32 zero
> bits.

Indeed, if you do it that way you want to check the return value and
set the value to 0 if either faults.

BTW I'm not sure what's supposed to happen on write if half faults
after the other half already succeeded... Either a C approach or an
asm approach has to consider that.

> > I don't have an objection to doing it the way you've proposed, but I
> > don't think there's any performance distinction or issue with the two
> > invocations.
>
> Assuming we don't need two exception table entries (put_user_64 currently
> uses only one, maybe it's wrong), using put_user_32 twice creates an extra
> unneeded exception table entry, which will "bloat" the exception table.
> That table is most likely accessed by a binary search algorithm, so the
> performance loss is marginal, though. Also a bigger table size is
> cache-unfriendly. (Again, this is likely marginal again, as binary search
> is already extremely cache-unfriendly).
>
> A similar argument can be made for the exception handler. Even if we need
> two entries in the exception table, so the first paragraph does not apply,
> the two entries in the exception table can share the same exception
> handler (clear the whole 64-bit destination to zero, set -EFAULT, jump
> past both load instructions), so that part of (admittedly cold) kernel
> code can get some instructios shorter.

Indeed. I don't think it's a significant difference but if kernel
folks do that's fine. In cases like this my personal preference is to
err on the side of less arch-specific asm.

Rich