Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
From: Shaokun Zhang
Date: Fri Jan 17 2020 - 01:54:17 EST
+Cc Will Deacon,
On 2020/1/16 23:19, Eric Dumazet wrote:
>
>
> On 1/16/20 7:12 AM, Eric Dumazet wrote:
>>
>>
>> On 1/16/20 4:27 AM, David Miller wrote:
>>> From: Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx>
>>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>>
>>>> From: Yuqi Jin <jinyuqi@xxxxxxxxxx>
>>>>
>>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>>> the access number of the global variable @p_id in the loop. Let's
>>>> optimize it for performance.
>>>>
>>>> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
>>>> Cc: Alexey Kuznetsov <kuznet@xxxxxxxxxxxxx>
>>>> Cc: Hideaki YOSHIFUJI <yoshfuji@xxxxxxxxxxxxxx>
>>>> Cc: Eric Dumazet <edumazet@xxxxxxxxxx>
>>>> Cc: Yang Guo <guoyang2@xxxxxxxxxx>
>>>> Signed-off-by: Yuqi Jin <jinyuqi@xxxxxxxxxx>
>>>> Signed-off-by: Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx>
>>>
>>> I doubt this makes any measurable improvement in performance.
>>>
>>> If you can document a specific measurable improvement under
>>> a useful set of circumstances for real usage, then put those
>>> details into the commit message and resubmit.
>>>
>>> Otherwise, I'm not applying this, sorry.
>>>
>>
>>
>> Real difference that could be made here is to
>> only use this cmpxchg() dance for CONFIG_UBSAN
>>
>> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
>>
>> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)
>
> I will test something like :
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
> if (old != now && cmpxchg(p_tstamp, old, now) == old)
> delta = prandom_u32_max(now - old);
>
> - /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> +#ifdef CONFIG_UBSAN
I appreciate Eric's idea.
> + /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
> do {
> old = (u32)atomic_read(p_id);
> new = old + delta + segs;
> } while (atomic_cmpxchg(p_id, old, new) != old);
But about this, I still think that we can try to use atomic_try_cmpxchg instead
of atomic_cmpxchg, like refcount_add_not_zero in include/linux/refcount.h
I abstract the model, as follow:
while (get_cycles() <= (time_temp + t_cnt))
{
do{
old_temp = wk_in->num.counter;
oldt = atomic64_cmpxchg(&wk_in->num, old_temp, old_temp+1);
}while(oldt != old_temp);
myndelay(delay_o);
w_cnt+=1;
}
val = atomic64_read(&wk_in->num);
while (get_cycles() <= (time_temp + t_cnt))
{
do{
new_val = val + 1;
}while(!atomic64_try_cmpxchg(&wk_in->num, &val, new_val));
myndelay(delay_o);
w_cnt += 1;
}
If we take the access global variable out of loop, the performance become better
on both x86 and arm64, as follow:
Kunpeng920: 24 cores call it and the successful operations per second
atomic64_read in loop atomic64_read out of the loop
6291034 8751962
Intel 6248: 20 cores call it and the successful operations per second
atomic64_read in loop atomic64_read out of the loop
8934792 9978998
So how about this? ;-)
delta = prandom_u32_max(now - old);
+#ifdef CONFIG_UBSAN
/* Do not use atomic_add_return() as it makes UBSAN unhappy */
+ old = (u32)atomic_read(p_id);
do {
- old = (u32)atomic_read(p_id);
new = old + delta + segs;
- } while (atomic_cmpxchg(p_id, old, new) != old);
+ } while (!atomic_try_cmpxchg(p_id, &old, new));
+#else
+ new = atomic_add_return(segs + delta, p_id);
+#endif
Thanks,
Shaokun
> +#else
> + new = atomic_add_return(segs + delta, p_id);
> +#endif
>
> return new - segs;
> }
>
>
> .
>