Re: [PATCH net-next v6 06/15] ethtool: netlink bitset handling

From: Jiri Pirko
Date: Wed Jul 03 2019 - 07:49:38 EST


Tue, Jul 02, 2019 at 01:50:09PM CEST, mkubecek@xxxxxxx wrote:
>The ethtool netlink code uses common framework for passing arbitrary
>length bit sets to allow future extensions. A bitset can be a list (only
>one bitmap) or can consist of value and mask pair (used e.g. when client
>want to modify only some bits). A bitset can use one of two formats:
>verbose (bit by bit) or compact.
>
>Verbose format consists of bitset size (number of bits), list flag and
>an array of bit nests, telling which bits are part of the list or which
>bits are in the mask and which of them are to be set. In requests, bits
>can be identified by index (position) or by name. In replies, kernel
>provides both index and name. Verbose format is suitable for "one shot"
>applications like standard ethtool command as it avoids the need to
>either keep bit names (e.g. link modes) in sync with kernel or having to
>add an extra roundtrip for string set request (e.g. for private flags).
>
>Compact format uses one (list) or two (value/mask) arrays of 32-bit
>words to store the bitmap(s). It is more suitable for long running
>applications (ethtool in monitor mode or network management daemons)
>which can retrieve the names once and then pass only compact bitmaps to
>save space.
>
>Userspace requests can use either format and ETHTOOL_RF_COMPACT flag in
>request header tells kernel which format to use in reply. Notifications
>always use compact format.
>
>Signed-off-by: Michal Kubecek <mkubecek@xxxxxxx>
>---
> Documentation/networking/ethtool-netlink.txt | 61 ++
> include/uapi/linux/ethtool_netlink.h | 35 ++
> net/ethtool/Makefile | 2 +-
> net/ethtool/bitset.c | 606 +++++++++++++++++++
> net/ethtool/bitset.h | 40 ++
> net/ethtool/netlink.h | 9 +
> 6 files changed, 752 insertions(+), 1 deletion(-)
> create mode 100644 net/ethtool/bitset.c
> create mode 100644 net/ethtool/bitset.h
>
>diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
>index 97c369aa290b..4636682c551f 100644
>--- a/Documentation/networking/ethtool-netlink.txt
>+++ b/Documentation/networking/ethtool-netlink.txt
>@@ -73,6 +73,67 @@ set, the behaviour is the same as (or closer to) the behaviour before it was
> introduced.
>
>
>+Bit sets
>+--------
>+
>+For short bitmaps of (reasonably) fixed length, standard NLA_BITFIELD32 type
>+is used. For arbitrary length bitmaps, ethtool netlink uses a nested attribute
>+with contents of one of two forms: compact (two binary bitmaps representing
>+bit values and mask of affected bits) and bit-by-bit (list of bits identified
>+by either index or name).
>+
>+Compact form: nested (bitset) atrribute contents:
>+
>+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
>+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
>+
>+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
>+to a multiple of 32 bits. They consist of 32-bit words in host byte order,

Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
nested array of NLA_BITFIELD32 instead?


>+words ordered from least significant to most significant (i.e. the same way as
>+bitmaps are passed with ioctl interface).
>+
>+For compact form, ETHTOOL_A_BITSET_SIZE and ETHTOOL_A_BITSET_VALUE are
>+mandatory. Similar to BITFIELD32, a compact form bit set requests to set bits

Double space^^


>+in the mask to 1 (if the bit is set in value) or 0 (if not) and preserve the
>+rest. If ETHTOOL_A_BITSET_LIST is present, there is no mask and bitset
>+represents a simple list of bits.

Okay, that is a bit confusing. Why not to rename to something like:
ETHTOOL_A_BITSET_NO_MASK (flag)
?


>+
>+Kernel bit set length may differ from userspace length if older application is
>+used on newer kernel or vice versa. If userspace bitmap is longer, an error is
>+issued only if the request actually tries to set values of some bits not
>+recognized by kernel.
>+
>+Bit-by-bit form: nested (bitset) attribute contents:
>+
>+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>+ ETHTOOL_A_BITSET_BIT (nested) array of bits
>+ ETHTOOL_A_BITSET_BIT+ (nested) one bit
>+ ETHTOOL_A_BIT_INDEX (u32) bit index (0 for LSB)
>+ ETHTOOL_A_BIT_NAME (string) bit name
>+ ETHTOOL_A_BIT_VALUE (flag) present if bit is set
>+
>+Bit size is optional for bit-by-bit form. ETHTOOL_A_BITSET_BITS nest can only
>+contain ETHTOOL_A_BITS_BIT attributes but there can be an arbitrary number of
>+them. A bit may be identified by its index or by its name. When used in
>+requests, listed bits are set to 0 or 1 according to ETHTOOL_A_BIT_VALUE, the
>+rest is preserved. A request fails if index exceeds kernel bit length or if
>+name is not recognized.
>+
>+When ETHTOOL_A_BITSET_LIST flag is present, bitset is interpreted as a simple
>+bit list. ETHTOOL_A_BIT_VALUE attributes are not used in such case. Bit list
>+represents a bitmap with listed bits set and the rest zero.
>+
>+In requests, application can use either form. Form used by kernel in reply is
>+determined by a flag in flags field of request header. Semantics of value and
>+mask depends on the attribute. General idea is that flags control request
>+processing, info_mask control which parts of the information are returned in
>+"get" request and index identifies a particular subcommand or an object to
>+which the request applies.

This is quite complex and confusing. Having the same API for 2 APIs is
odd. The API should be crystal clear, easy to use.

Why can't you have 2 commands, one working with bit arrays only, one
working with strings? Something like:
X_GET
ETHTOOL_A_BITS (nested)
ETHTOOL_A_BIT_ARRAY (BITFIELD32)
X_NAMES_GET
ETHTOOL_A_BIT_NAMES (nested)
ETHTOOL_A_BIT_INDEX
ETHTOOL_A_BIT_NAME

For set, you can also have multiple cmds:
X_SET - to set many at once, by bit index
ETHTOOL_A_BITS (nested)
ETHTOOL_A_BIT_ARRAY (BITFIELD32)
X_ONE_SET - to set one, by bit index
ETHTOOL_A_BIT_INDEX
ETHTOOL_A_BIT_VALUE
X_ONE_SET - to set one, by name
ETHTOOL_A_BIT_NAME
ETHTOOL_A_BIT_VALUE


[...]