Re: [External] Re: [PATCH] bpf: use count for prealloc hashtab too

From: Chengming Zhou
Date: Mon Oct 18 2021 - 23:14:39 EST


在 2021/10/19 上午9:57, Alexei Starovoitov 写道:
> On Sun, Oct 17, 2021 at 10:49 PM Chengming Zhou
> <zhouchengming@xxxxxxxxxxxxx> wrote:
>>
>> 在 2021/10/16 上午3:58, Alexei Starovoitov 写道:
>>> On Fri, Oct 15, 2021 at 11:04 AM Chengming Zhou
>>> <zhouchengming@xxxxxxxxxxxxx> wrote:
>>>>
>>>> We only use count for kmalloc hashtab not for prealloc hashtab, because
>>>> __pcpu_freelist_pop() return NULL when no more elem in pcpu freelist.
>>>>
>>>> But the problem is that __pcpu_freelist_pop() will traverse all CPUs and
>>>> spin_lock for all CPUs to find there is no more elem at last.
>>>>
>>>> We encountered bad case on big system with 96 CPUs that alloc_htab_elem()
>>>> would last for 1ms. This patch use count for prealloc hashtab too,
>>>> avoid traverse and spin_lock for all CPUs in this case.
>>>>
>>>> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
>>>
>>> It's not clear from the commit log what you're solving.
>>> The atomic inc/dec in critical path of prealloc maps hurts performance.
>>> That's why it's not used.
>>>
>> Thanks for the explanation, what I'm solving is when hash table hasn't free
>> elements, we don't need to call __pcpu_freelist_pop() to traverse and
>> spin_lock all CPUs. The ftrace output of this bad case is below:
>>
>> 50) | htab_map_update_elem() {
>> 50) 0.329 us | _raw_spin_lock_irqsave();
>> 50) 0.063 us | lookup_elem_raw();
>> 50) | alloc_htab_elem() {
>> 50) | pcpu_freelist_pop() {
>> 50) 0.209 us | _raw_spin_lock();
>> 50) 0.264 us | _raw_spin_lock();
>
> This is LRU map. Not hash map.
> It will grab spin_locks of other cpus
> only if all previous cpus don't have free elements.
> Most likely your map is actually full and doesn't have any free elems.
> Since it's an lru it will force free an elem eventually.
>

Maybe I missed something, the map_update_elem function of LRU map is
htab_lru_map_update_elem() and the htab_map_update_elem() above is the
map_update_elem function of hash map.
Because of the implementation of percpu freelist used in hash map, it
will spin_lock all other CPUs when there is no free elements.

Thanks.