Re: [RFC bpf-next 0/3] bpf: adding map batch processing support

From: Yonghong Song
Date: Wed Nov 13 2019 - 17:26:31 EST




On 11/13/19 2:07 PM, Brian Vazquez wrote:
> Hi Yonghong,
>
> Thanks for reviewing it! I'm preparing the changes and will submit them
> sooner.
>
> As for the right way to manage author rights, does anyone knowÂwhat the
> correct approach is? Should I use Yonghong's patch and apply the
> extended support in different patches (i.e. support per_cpu maps, change
> batch from u64 to __aligned_u64, etc) or it is fine to apply the changes
> in place and write both sign-offs?

The logic flow of the patch set is most important.
You can add me as co-signoff if you reuse significant portion of my code.

>
> Thanks,
> Brian
>
> On Tue, Nov 12, 2019 at 6:34 PM Yonghong Song <yhs@xxxxxx
> <mailto:yhs@xxxxxx>> wrote:
>
>
>
> On 11/7/19 1:20 PM, Brian Vazquez wrote:
> > This is a follow up in the effort to batch bpf map operations to
> reduce
> > the syscall overhead with the map_ops. I initially proposed the
> idea and
> > the discussion is here:
> >
> https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@xxxxxxxxxx/
> >
> > Yonghong talked at the LPC about this and also proposed and idea that
> > handles the special weird case of hashtables by doing traversing
> using
> > the buckets as a reference instead of a key. Discussion is here:
> > https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@xxxxxx/
> >
> > This RFC proposes a way to extend batch operations for more data
> > structures by creating generic batch functions that can be used
> instead
> > of implementing the operations for each individual data structure,
> > reducing the code that needs to be maintained. The series
> contains the
> > patches used in Yonghong's RFC and the patch that adds the generic
> > implementation of the operations plus some testing with pcpu hashmaps
> > and arrays. Note that pcpu hashmap shouldn't use the generic
> > implementation and it either should have its own implementation
> or share
> > the one introduced by Yonghong, I added that just as an example
> to show
> > that the generic implementation can be easily added to a data
> structure.
> >
> > What I want to achieve with this RFC is to collect early feedback
> and see if
> > there's any major concern about this before I move forward. I do plan
> > to better separate this into different patches and explain them
> properly
> > in the commit messages.
>
> Thanks Brian for working on this. The general approach described here
> is good to me. Having a generic implementation for batch operations
> looks good for maps (not hash table, queue/stack, etc.)
>
> >
> > Current known issues where I would like to discuss are the
> followings:
> >
> > - Because Yonghong's UAPI definition was done specifically for
> >Â Â iterating buckets, the batch field is u64 and is treated as an u64
> >Â Â instead of an opaque pointer, this won't work for other data
> structures
> >Â Â that are going to use a key as a batch token with a size
> greater than
> >Â Â 64. Although I think at this point the only key that couldn't be
> >Â Â treated as a u64 is the key of a hashmap, and the hashmap
> won't use
> >Â Â the generic interface.
>
> The u64 can be changed with a __aligned_u64 opaque value. This way,
> it can represent a pointer or a 64bit value.
>
> > - Not all the data structures use delete (because it's not a valid
> >Â Â operation) i.e. arrays. So maybe lookup_and_delete_batch
> command is
> >Â Â not needed and we can handle that operation with a
> lookup_batch and a
> >Â Â flag.
>
> This make sense.
>
> > - For delete_batch (not just the lookup_and_delete_batch). Is this
> >Â Â operation really needed? If so, shouldn't it be better if the
> >Â Â behaviour is delete the keys provided? I did that with my generic
> >Â Â implementation but Yonghong's delete_batch for a hashmap deletes
> >Â Â buckets.
>
> We need batched delete in bcc. lookup_and_delete_batch is better as
> it can preserves more new map entries. Alternatively, deleting
> all entries after lookup is another option. But this may remove
> more new map entries. Statistically this may or may not matter though.
>
> bcc does have a clear_table (clear_map) API, but not clear who is
> using it.
>
> So, I did not have a concrete use case for delete_batch yet.
> I tend to think we should have delete_batch for API completeness,
> but maybe other people can comment on this as well.
>
> Maybe initial patch, we can skip it. But we should still ensure
> user interface data structure can handle batch delete if it is
> needed later. The current data structure should handle this
> as far as I know.
>
> >
> > Brian Vazquez (1):
> >Â Â bpf: add generic batch support
> >
> > Yonghong Song (2):
> >Â Â bpf: adding map batch processing support
> >Â Â tools/bpf: test bpf_map_lookup_and_delete_batch()
> >
> > Âinclude/linux/bpf.h             Â| 21 +
> > Âinclude/uapi/linux/bpf.h           | 22 +
> > Âkernel/bpf/arraymap.c            Â| Â4 +
> > Âkernel/bpf/hashtab.c             | 331 ++++++++++
> > Âkernel/bpf/syscall.c             | 573
> ++++++++++++++----
> > Âtools/include/uapi/linux/bpf.h        | 22 +
> > Âtools/lib/bpf/bpf.c             Â| 59 ++
> > Âtools/lib/bpf/bpf.h             Â| 13 +
> > Âtools/lib/bpf/libbpf.map           | Â4 +
> > Â.../map_tests/map_lookup_and_delete_batch.c Â| 245 ++++++++
> > Â.../map_lookup_and_delete_batch_array.c   Â| 118 ++++
> >Â Â11 files changed, 1292 insertions(+), 120 deletions(-)
> >Â Âcreate mode 100644
> tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
> >Â Âcreate mode 100644
> tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> >
>