Re: [PATCH RESEND v3 6/6] powerpc/signal: Use unsafe_copy_siginfo_to_user()
From: Eric W. Biederman
Date: Mon Sep 13 2021 - 15:11:37 EST
Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:
> Le 13/09/2021 à 18:21, Eric W. Biederman a écrit :
>> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>>
>>> Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:
>>>
>>>> Use unsafe_copy_siginfo_to_user() in order to do the copy
>>>> within the user access block.
>>>>
>>>> On an mpc 8321 (book3s/32) the improvment is about 5% on a process
>>>> sending a signal to itself.
>>
>> If you can't make function calls from an unsafe macro there is another
>> way to handle this that doesn't require everything to be inline.
>>
>> From a safety perspective it is probably even a better approach.
>
> Yes but that's exactly what I wanted to avoid for the native ppc32 case: this
> double hop means useless pressure on the cache. The siginfo_t structure is 128
> bytes large, that means 8 lines of cache on powerpc 8xx.
>
> But maybe it is acceptable to do that only for the compat case. Let me think
> about it, it might be quite easy.
The places get_signal is called tend to be well known. So I think we
are safe from a capacity standpoint.
I am not certain it makes a difference in capacity as there is a high
probability that the stack was deeper recently than it is now which
suggests the cache blocks might already be in the cache.
My sense it is worth benchmarking before optimizing out the extra copy
like that.
On the extreme side there is simply building the entire sigframe on the
stack and then just calling it copy_to_user. As the stack cache lines
are likely to be hot, and copy_to_user is quite well optimized
there is a real possibility that it is faster to build everything
on the kernel stack, and then copy it to the user space stack.
It is also possible that I am wrong and we may want to figure out how
far up we can push the conversion to the 32bit siginfo format.
If could move the work into collect_signal we could guarantee there
would be no extra work. That would require adjusting the sigframe
generation code on all of the architectures.
There is a lot we can do but we need benchmarking to tell if it is
worth it.
Eric