Re: [PATCH bpf-next v3] docs/bpf: Add LRU internals description and graph

From: Joe Stringer
Date: Wed Mar 15 2023 - 21:20:00 EST


On Mon, Mar 13, 2023 at 10:21 PM John Fastabend
<john.fastabend@xxxxxxxxx> wrote:
>
> Joe Stringer wrote:
> > Extend the bpf hashmap docs to include a brief description of the
> > internals of the LRU map type (setting appropriate API expectations),
> > including the original commit message from Martin and a variant on the
> > graph that I had presented during my Linux Plumbers Conference 2022 talk
> > on "Pressure feedback for LRU map types"[0].
> >
> > The node names in the dot file correspond roughly to the functions where
> > the logic for those decisions or steps is defined, to help curious
> > developers to cross-reference and update this logic if the details of
> > the LRU implementation ever differ from this description.
> >
> > [0]: https://lpc.events/event/16/contributions/1368/
> >
> > Signed-off-by: Joe Stringer <joe@xxxxxxxxxxxxx>
>
>
> Thanks couple nits below
>
> > ---
> > v3: Use standard table syntax
> > Replace inline commit message with reference to commit
> > Fix incorrect Y/N label for common LRU check
> > Rename some dotfile variables to reduce confusion between cases
> > Minor wording touchups
> > v2: Fix issue that caused initial email submission to fail
> > ---
> > Documentation/bpf/map_hash.rst | 62 ++++++++
> > Documentation/bpf/map_lru_hash_update.dot | 166 ++++++++++++++++++++++
> > 2 files changed, 228 insertions(+)
> > create mode 100644 Documentation/bpf/map_lru_hash_update.dot
> > diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst
> > index 8669426264c6..61602ce26561 100644
> > --- a/Documentation/bpf/map_hash.rst
> > +++ b/Documentation/bpf/map_hash.rst
> > @@ -1,5 +1,6 @@
> > .. SPDX-License-Identifier: GPL-2.0-only
> > .. Copyright (C) 2022 Red Hat, Inc.
> > +.. Copyright (C) 2022-2023 Isovalent, Inc.
> >
> > ===============================================
> > BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
> > @@ -206,3 +207,64 @@ Userspace walking the map elements from the map declared above:
> > cur_key = &next_key;
> > }
> > }
> > +
> > +Internals
> > +=========
> > +
> > +This section of the document is targeted at Linux developers and describes
> > +aspects of the map implementations that are not considered stable ABI. The
> > +following details are subject to change in future versions of the kernel.
> > +
> > +``BPF_MAP_TYPE_LRU_HASH`` and variants
> > +--------------------------------------
> > +
> > +An LRU hashmap type consists of two properties: Firstly, it is a hash map and
> > +hence is indexable by key for constant time lookups. Secondly, when at map
> > +capacity, map updates will trigger eviction of old entries based on the age of
> > +the elements in a set of lists. Each of these properties may be either global
> > +or per-CPU, depending on the map type and flags used to create the map:
> > +
> > ++------------------------+---------------------------+----------------------------------+
> > +| | ``BPF_MAP_TYPE_LRU_HASH`` | ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` |
> > ++========================+===========================+==================================+
> > +| ``BPF_NO_COMMON_LRU`` | Per-CPU LRU, global map | Per-CPU LRU, per-cpu map |
> > ++------------------------+---------------------------+----------------------------------+
> > +| ``!BPF_NO_COMMON_LRU`` | Global LRU, global map | Global LRU, per-cpu map |
> > ++------------------------+---------------------------+----------------------------------+
>
> Above all seems API to me. Maybe move the statement about not considered stable
> ABI down here? Something like,
>
> "
> The internal details of which entry is evicted and acquiring a new entry
> are not considered stable and may change in the future. But the current
> impelementation is as follows.
> "
>
> Or something like that?

Yep sounds good to me, I'll fix that up.

> > +
> > +Notably, there are various steps that the update algorithm attempts in order to
> > +enforce the LRU property which have increasing impacts on other CPUs involved
> > +in the following operation attempts:
> > +
> > +- Attempt to use CPU-local state to batch operations
> > +- Attempt to fetch free nodes from global lists
> > +- Attempt to pull any node from a global list and remove it from the hashmap
> > +- Attempt to pull any node from any CPU's list and remove it from the hashmap
> > +
> > +Even if an LRU node may be acquired, maps of type ``BPF_MAP_TYPE_LRU_HASH``
> > +may fail to insert the entry into the map if other CPUs are heavily contending
> > +on the global hashmap lock.
>
> Similarly this is ABI correct? Probably we can also specify the error code?
> Assuming it is just EBUSY or EAGAIN?

Hmm. It's EBUSY right now. Also looks like we're missing corresponding
docs in the uapi header for this case:

https://github.com/torvalds/linux/blob/9c1bec9c0b08abeac72ed6214b723adc224013bf/include/uapi/linux/bpf.h#L163

I think that "this function may fail due to contention on map usage,
in which case it returns EBUSY" (or similar wording) is reasonable to
document as ABI. IMO that should go in the header linked above (which
gets turned into UAPI docs elsewhere). This particular phrasing is
going into more detail around things like the hashmap lock which
should not be described as part of the uAPI/ABI.