Re: [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty

From: Brian Vazquez
Date: Sat Aug 01 2020 - 14:11:53 EST


On Sat, Aug 1, 2020 at 9:59 AM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 7/31/20 9:57 PM, Brian Vazquez wrote:
> > While running some experiments it was observed that map_lookup_batch was much
> > slower than get_next_key + lookup when the syscall overhead is minimal.
> > This was because the map_lookup_batch implementation was more expensive
> > traversing empty buckets, this can be really costly when the pre-allocated
> > map is too big.
> >
> > This patch optimizes the case when the bucket is empty so we can move quickly
> > to next bucket.
> >
> > The benchmark to exercise this is as follows:
> >
> > -The map was populate with a single entry to make sure that the syscall overhead
> > is not helping the map_batch_lookup.
> > -The size of the preallocated map was increased to show the effect of
> > traversing empty buckets.
> >
> > Results:
> >
> > Using get_next_key + lookup:
> >
> > Benchmark Time(ns) CPU(ns) Iteration
> > ---------------------------------------------------------------
> > BM_DumpHashMap/1/1k 3593 3586 192680
> > BM_DumpHashMap/1/4k 6004 5972 100000
> > BM_DumpHashMap/1/16k 15755 15710 44341
> > BM_DumpHashMap/1/64k 59525 59376 10000
>
> I think "BM_DumpHashMap/1/64k" means the program "BM_DumpHashMap",
> the map having only "1" entry, and the map preallocated size is "64k"?
> What is the "Iteration" here? The number of runs with the same dump?
> The CPU(ns) is the system cpu consumption, right? The Time/CPU is for
> all iterations, not just one, right? It would be good
> if the above results can be described better, so people can
> understand the results better.
>

Hi Yonghong, thanks for reviewing it!

I'll fix it in next iteration.
> >
> > Using htab_lookup_batch before this patch:
> > Benchmark Time(ns) CPU(ns) Iterations
> > ---------------------------------------------------------------
> > BM_DumpHashMap/1/1k 3933 3927 177978
> > BM_DumpHashMap/1/4k 9192 9177 73951
> > BM_DumpHashMap/1/16k 42011 41970 16789
> > BM_DumpHashMap/1/64k 117895 117661 6135
> >
> > Using htab_lookup_batch with this patch:
> > Benchmark Time(ns) CPU(ns) Iterations
> > ---------------------------------------------------------------
> > BM_DumpHashMap/1/1k 2809 2803 249212
> > BM_DumpHashMap/1/4k 5318 5316 100000
> > BM_DumpHashMap/1/16k 14925 14895 47448
> > BM_DumpHashMap/1/64k 58870 58674 10000
> >
> > Suggested-by: Luigi Rizzo <lrizzo@xxxxxxxxxx>
> > Cc: Yonghong Song <yhs@xxxxxx>
> > Signed-off-by: Brian Vazquez <brianvv@xxxxxxxxxx>
> > ---
> > kernel/bpf/hashtab.c | 23 ++++++++---------------
> > 1 file changed, 8 insertions(+), 15 deletions(-)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 2137e2200d95..150015ea6737 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> > struct hlist_nulls_head *head;
> > struct hlist_nulls_node *n;
> > unsigned long flags = 0;
> > - bool locked = false;
> > struct htab_elem *l;
> > struct bucket *b;
> > int ret = 0;
> > @@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> > dst_val = values;
> > b = &htab->buckets[batch];
> > head = &b->head;
> > - /* do not grab the lock unless need it (bucket_cnt > 0). */
> > - if (locked)
> > - flags = htab_lock_bucket(htab, b);
> >
> > + l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
> > + struct htab_elem, hash_node);
> > + if (!l && (batch + 1 < htab->n_buckets)) {
> > + batch++;
> > + goto again_nocopy;
> > + }
> > +
> > + flags = htab_lock_bucket(htab, b);
> [...]