Re: [RESEND][PATCH v3 14/17] static_call: Add static_cond_call()
From: Rasmus Villemoes
Date: Fri Mar 27 2020 - 09:26:05 EST
On 27/03/2020 11.08, Peter Zijlstra wrote:
> On Fri, Mar 27, 2020 at 12:37:35AM +0100, Rasmus Villemoes wrote:
>> On 24/03/2020 14.56, Peter Zijlstra wrote:
>>> Extend the static_call infrastructure to optimize the following common
>>> pattern:
>>>
>>> if (func_ptr)
>>> func_ptr(args...)
>>>
>>
>>> +#define DEFINE_STATIC_COND_CALL(name, _func) \
>>> + DECLARE_STATIC_CALL(name, _func); \
>>> + struct static_call_key STATIC_CALL_NAME(name) = { \
>>> + .func = NULL, \
>>> + }
>>> +
>>> #define static_call(name) \
>>> ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_NAME(name).func))
>>>
>>> +#define static_cond_call(name) \
>>> + if (STATIC_CALL_NAME(name).func) \
>>> + ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_NAME(name).func))
>>> +
>>
>> What, apart from fear of being ridiculed by kernel folks, prevents the
>> compiler from reloading STATIC_CALL_NAME(name).func ? IOW, doesn't this
>> want a READ_ONCE somewhere?
>
> Hurmph.. I suspect you're quite right, but at the same time I can't seem
> to write a macro that does that :/ Let me try harder.
Hm, yeah, essentially one wants some macro magic that turns
foo(a)(b, c, d)
into
bar(a, b, c, d)
and then bar() can do the right thing.
One option is to give up on the nice syntax and just make it
static_cond_call(func, ...)
But, here's another few things that makes me wonder if the cond_call
variant is worth it, at least in its current form: In the case where
!ARCH_HAVE_STATIC_CALL, so static_cond_call(foo)(a, b, c) is just syntax
sugar for
if (foo)
foo(a, b, c)
gcc can choose to wait with computing the argument expressions a, b, c
until after the test - they may be somewhat expensive, but at the very
least there's some register shuffling to do to prepare for the call, and
probably also some restoring afterwards. In the ARCH_HAVE_STATIC_CALL
case, whether inline or not, it becomes an unconditional call from gcc's
perspective, so all the arguments must be computed and stuffed in the
right registers. That price may be higher than the load+test. Not to
mention the fact that side-effects in the arguments happen
unconditionally for ARCH_HAVE_STATIC_CALL but only if func is non-null
for !ARCH_HAVE_STATIC_CALL.
Perhaps associating a static_key with each STATIC_COND_CALL could solve
these. But that of course makes the update procedure somewhat more
complicated.
Rasmus