Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Willem de Bruijn
Date: Wed Jul 24 2019 - 16:14:16 EST
On Wed, Jul 24, 2019 at 1:10 PM Brian Vazquez <brianvv@xxxxxxxxxx> wrote:
>
> This introduces a new command to retrieve multiple number of entries
> from a bpf map, wrapping the existing bpf methods:
> map_get_next_key and map_lookup_elem
>
> To start dumping the map from the beginning you must specify NULL as
> the prev_key.
>
> The new API returns 0 when it successfully copied all the elements
> requested or it copied less because there weren't more elements to
> retrieved (i.e err == -ENOENT). In last scenario err will be masked to 0.
I think I understand this, but perhaps it can be explained a bit more
concisely without reference to ENOENT and error masking. The function
returns the min of the number of requested elements and the number of
remaining elements in the map from the given starting point
(prev_key).
> On a successful call buf and buf_len will contain correct data and in
> case prev_key was provided (not for the first walk, since prev_key is
> NULL) it will contain the last_key copied into the prev_key which will
> simplify next call.
>
> Only when it can't find a single element it will return -ENOENT meaning
> that the map has been entirely walked. When an error is return buf,
> buf_len and prev_key shouldn't be read nor used.
That's common for error handling. No need to state explicitly.
> Because maps can be called from userspace and kernel code, this function
> can have a scenario where the next_key was found but by the time we
> try to retrieve the value the element is not there, in this case the
> function continues and tries to get a new next_key value, skipping the
> deleted key. If at some point the function find itself trap in a loop,
> it will return -EINTR.
Good to point this out! I don't think that unbounded continue;
statements until an interrupt happens is sufficient. Please bound the
number of retries to a low number.
> The function will try to fit as much as possible in the buf provided and
> will return -EINVAL if buf_len is smaller than elem_size.
>
> QUEUE and STACK maps are not supported.
>
> Note that map_dump doesn't guarantee that reading the entire table is
> consistent since this function is always racing with kernel and user code
> but the same behaviour is found when the entire table is walked using
> the current interfaces: map_get_next_key + map_lookup_elem.
> It is also important to note that with a locked map, the lock is grabbed
> for 1 entry at the time, meaning that the returned buf might or might not
> be consistent.
Would it be informative to signal to the caller if the read was
complete and consistent (because the entire table was read while the
lock was held)?
>
> Suggested-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> Signed-off-by: Brian Vazquez <brianvv@xxxxxxxxxx>