RE: [EXTERNAL] Re: [PATCH net-next v4 7/12] net: atlantic: implement AQC113 L2/L3/L4 RX filter ops

From: Sukhdeep Soni [C]

Date: Wed Jun 10 2026 - 07:06:10 EST

On 4 Jun 2026, Jakub Kicinski wrote:

> net: atlantic: implement AQC113 L2/L3/L4 RX filter ops
>
> This patch adds tag-based L2/L3/L4 RX filter management for AQC113,
> including reference-counted L3 and L4 filter slots, an ART-based L3L4
> configuration helper, and L2 ethertype set/clear, and wires the new
> ops into hw_atl2_ops.
>
> [Low]
> The commit message describes hw_atl2_filter_tag_get/put as:
>
> Add tag-based filter policy with reference-counted sharing
> (hw_atl2_filter_tag_get/put), allowing multiple filter rules to share
> the same L3 or L4 hardware filter when their match criteria are
> identical.
>
> In the patch, hw_atl2_filter_tag_get/put operate only on
> priv->etype_policy[] and are called only from hw_atl2_hw_fl2_set/clear.
> The L3 and L4 sharing is implemented separately via the usage field on
> hw_atl2_l3_filter / hw_atl2_l4_filter and hw_atl2_rxf_l3_get/put /
> hw_atl2_rxf_l4_get/put.
>
> Could the description be reworded so it does not conflate the etype tag
> policy with the L3/L4 filter refcounts?

The commit message has been reworded in v5 to clearly separate the
two mechanisms:

- hw_atl2_filter_tag_get/put - tag-based policy for L2 ethertype
filters, allocating and releasing ART tag slots
- hw_atl2_rxf_l3_get/put / hw_atl2_rxf_l4_get/put - serialized usage
counter based sharing for L3 and L4 hardware filter slots when
match criteria are identical

> diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c
> index 27a62fffbc29..a07a3dbddefe 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c
> @@ -87,6 +87,38 @@ const struct aq_hw_caps_s hw_atl2_caps_aqc116c = {
> AQ_NIC_RATE_10M,
> };
>
> +/* Find tag with the same action or new free tag
> + * top - top inclusive tag value
> + * action - action for ActionResolverTable
> + */
> +static int hw_atl2_filter_tag_get(struct hw_atl2_tag_policy *tags,
> + int top, u16 action)
> +{
> + int i;
> +
> + for (i = 1; i <= top; i++)
> + if (tags[i].usage > 0 && tags[i].action == action) {
> + tags[i].usage++;
> + return i;
> + }
> +
> + for (i = 1; i <= top; i++)
> + if (tags[i].usage == 0) {
> + tags[i].usage = 1;
> + tags[i].action = action;
> + return i;
> + }
> +
> + return -1;
>
> [Low]
> Sibling helpers introduced in this patch (hw_atl2_new_fl3l4_find_l3 /
> _find_l4) return -ENOSPC on the no-free-slot path, but this helper
> returns the bare value -1.
>
> The current caller hw_atl2_hw_fl2_set() translates it into -ENOSPC
> manually, but if a future caller propagates the value unchanged it
> becomes -EPERM. Would returning -ENOSPC here directly be more
> consistent?

Fixed - hw_atl2_filter_tag_get() now returns -ENOSPC directly on the
no-free-slot path, consistent with the sibling helpers. The caller's if (tag < 0) check
is unaffected since -ENOSPC is still negative.

> +}
>
> [ ... ]
>
> @@ -823,6 +855,489 @@ static struct aq_stats_s *hw_atl2_utils_get_hw_stats(struct aq_hw_s *self)
> return &self->curr_stats;
> }
>
> +static bool hw_atl2_rxf_l3_is_equal(struct hw_atl2_l3_filter *f1,
> + struct hw_atl2_l3_filter *f2)
> +{
> + if (f1->cmd != f2->cmd)
> + return false;
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_CMD_SA_EN)
> + if (f1->srcip[0] != f2->srcip[0])
> + return false;
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_CMD_DA_EN)
> + if (f1->dstip[0] != f2->dstip[0])
> + return false;
> +
> + if (f1->cmd & (HW_ATL2_RPF_L3_CMD_PROTO_EN |
> + HW_ATL2_RPF_L3_V6_CMD_PROTO_EN))
> + if (f1->proto != f2->proto)
> + return false;
>
> [Low]
> Where is l3.proto ever populated? In hw_atl2_new_fl3l4_configure() the
> local l3 is memset to 0 and the protocol value is encoded into the upper
> bits of l3.cmd via:
>
> l3.cmd |= IPPROTO_TCP << (data->is_ipv6 ? 0x18 : 8);
>
> hw_atl2_rxf_l3_get() then copies _l3->proto (=0) into the tracked slot,
> so this proto comparison appears to be 0 == 0 in all cases. The actual
> protocol comparison happens implicitly via the f1->cmd != f2->cmd check
> above.
>
> Should l3.proto be removed, or populated consistently from the cmd
> encoding, so future code that begins setting one but not the other does
> not silently produce false negatives in the sharing check?

The vendor reference driver has the same pattern - protocol is encoded into l3.cmd
bits and the f1->cmd != f2->cmd check already covers it. The proto field comparison
is redundant but not harmful. We will leave it as-is to stay close to the reference
driver, but can clean it up as a follow-up.

> +
> + if (f1->cmd & HW_ATL2_RPF_L3_V6_CMD_SA_EN)
> + if (memcmp(f1->srcip, f2->srcip, 16))
> + return false;
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_V6_CMD_DA_EN)
> + if (memcmp(f1->dstip, f2->dstip, 16))
> + return false;
> +
> + return true;
> +}
>
> [ ... ]
>
> +static void hw_atl2_rxf_l3_get(struct aq_hw_s *self,
> + struct hw_atl2_l3_filter *l3, int idx,
> + const struct hw_atl2_l3_filter *_l3)
> +{
> + int i;
> +
> + l3->usage++;
> + if (l3->usage == 1) {
> + l3->cmd = _l3->cmd;
> + for (i = 0; i < 4; i++) {
> + l3->srcip[i] = _l3->srcip[i];
> + l3->dstip[i] = _l3->dstip[i];
> + }
> + l3->proto = _l3->proto;
> +
> + if (l3->cmd & HW_ATL2_RPF_L3_CMD_EN) {
> + hw_atl2_rpf_l3_v4_cmd_set(self, l3->cmd, idx);
> + hw_atl2_rpf_l3_v4_tag_set(self, idx + 1, idx);
> + hw_atl2_rpf_l3_v4_dest_addr_set(self,
> + idx,
> + l3->dstip[0]);
> + hw_atl2_rpf_l3_v4_src_addr_set(self,
> + idx,
> + l3->srcip[0]);
> + } else {
> + hw_atl2_rpf_l3_v6_cmd_set(self, l3->cmd, idx);
> + hw_atl2_rpf_l3_v6_tag_set(self, idx + 1, idx);
> + hw_atl2_rpf_l3_v6_dest_addr_set(self,
> + idx,
> + l3->dstip);
> + hw_atl2_rpf_l3_v6_src_addr_set(self,
> + idx,
> + l3->srcip);
> + }
> + }
> +}
>
> [Medium]
> Is the ordering of the MMIO writes here intentional?
>
> The cmd word containing the EN/SA_EN/DA_EN/PROTO_EN bits is committed
> first via hw_atl2_rpf_l3_v4_cmd_set() (the v4 mask is 0xFFFF and
> includes the EN bit), and only afterwards are the destination and
> source address registers programmed.
>
> Between the cmd write and the address writes, the filter is enabled
> while its DA/SA registers still hold whatever value they had before
> (stale, zero, or a previously programmed filter's addresses).
>
> The IPv6 path is worse because hw_atl2_rpf_l3_v6_dest_addr_set() and
> _src_addr_set() each issue four separate register writes (one per
> dword), so the filter is enabled with a partially-programmed address
> for several MMIO cycles.
>
> The same pattern is in hw_atl2_rxf_l4_get() below. Would it be safer to
> program the addresses (or ports) and tag first and write the cmd word
> with EN last?
>
> [ ... ]

The ordering is inherited from the vendor reference driver which uses the same sequence. No change needed.

> +static void hw_atl2_rxf_l4_get(struct aq_hw_s *self,
> + struct hw_atl2_l4_filter *l4, int idx,
> + const struct hw_atl2_l4_filter *_l4)
> +{
> + l4->usage++;
> + if (l4->usage == 1) {
> + l4->cmd = _l4->cmd;
> + l4->sport = _l4->sport;
> + l4->dport = _l4->dport;
> +
> + hw_atl2_rpf_l4_cmd_set(self, l4->cmd, idx);
> + hw_atl2_rpf_l4_tag_set(self, idx + 1, idx);
> + hw_atl_rpf_l4_spd_set(self, l4->sport, idx);
> + hw_atl_rpf_l4_dpd_set(self, l4->dport, idx);
> + }
> +}
> +
> +static int hw_atl2_new_fl3l4_configure(struct aq_hw_s *self,
> + struct aq_rx_filter_l3l4 *data)
> +{
>
> [ ... ]
>
> + /* convert legacy filter to new */
> + if (data->cmd & HW_ATL_RX_ENABLE_CMP_PROT_L4) {
> + l3.cmd |= data->is_ipv6 ? HW_ATL2_RPF_L3_V6_CMD_PROTO_EN :
> + HW_ATL2_RPF_L3_CMD_PROTO_EN;
> + l3.cmd |= data->is_ipv6 ? HW_ATL2_RPF_L3_V6_CMD_EN :
> + HW_ATL2_RPF_L3_CMD_EN;
> + switch (data->cmd & 0x7) {
> + case HW_ATL_RX_TCP:
> + l3.cmd |= IPPROTO_TCP << (data->is_ipv6 ? 0x18 : 8);
> + break;
> + case HW_ATL_RX_UDP:
> + l3.cmd |= IPPROTO_UDP << (data->is_ipv6 ? 0x18 : 8);
> + break;
> + case HW_ATL_RX_SCTP:
> + l3.cmd |= IPPROTO_SCTP << (data->is_ipv6 ? 0x18 : 8);
> + break;
> + case HW_ATL_RX_ICMP:
> + l3.cmd |= IPPROTO_ICMP << (data->is_ipv6 ? 0x18 : 8);
> + break;
>
> [Low]
> For the IPv6 branch of HW_ATL_RX_ICMP, should the next-header value be
> IPPROTO_ICMPV6 (58) rather than IPPROTO_ICMP (1)? Real ICMPv6 traffic
> uses next-header 58, so this branch will not match ICMPv6 packets.
>
> The aq_set_data_fl3l4() helper in aq_filters.c does not currently set
> HW_ATL_RX_ICMP for any flow type, so the case is not reachable from the
>in-tree ethtool ntuple path today, but it is written explicitly here.
>

The reference driver uses IPPROTO_ICMP for both IPv4 and IPv6 paths.
As noted, HW_ATL_RX_ICMP is unreachable from the in-tree ethtool ntuple
path today, so this is a pre-existing inconsistency in the reference driver
We will leave it as-is to stay aligned, but note it as a potential future
fix when ICMPv6 support is wired up.

> [Low]
> The shift values 0x18 and 8 are repeated four times here, while the
> matching enable bits are named constants (HW_ATL2_RPF_L3_CMD_PROTO_EN
> and HW_ATL2_RPF_L3_V6_CMD_PROTO_EN). Would a pair of named shift
> macros (for example HW_ATL2_RPF_L3_CMD_PROTO_SHIFT /
> HW_ATL2_RPF_L3_V6_CMD_PROTO_SHIFT) make the relationship between the
> enable bit and the protocol byte position clearer and harder to
> mis-edit?

The bare shift values 0x18 and 8 are taken directly from the vendor reference
driver which uses the same literals. Adding named macros is a reasonable cleanup
but goes beyond the scope of this series. We will leave it as-is to minimize
divergence from the reference driver.

> + }
> + }
>
> [ ... ]
>
> + /* find L3 and L4 filters */
> + if (l3.cmd & (HW_ATL2_RPF_L3_CMD_EN | HW_ATL2_RPF_L3_V6_CMD_EN)) {
> + l3_idx = hw_atl2_new_fl3l4_find_l3(self, &l3);
> + if (l3_idx < 0)
> + return l3_idx;
> +
> + if (l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN)
> + l3_filters = priv->l3_v6_filters;
> + else
> + l3_filters = priv->l3_v4_filters;
> +
> + if (priv->l3l4_filters[data->location].l3_index != l3_idx ||
> + old_ipv6 != !!(l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN))
> + hw_atl2_rxf_l3_get(self, &l3_filters[l3_idx],
> + l3_idx, &l3);
> + }
> +
> + if (old_l3_index != -1) {
> + if (old_ipv6)
> + l3_filters = priv->l3_v6_filters;
> + else
> + l3_filters = priv->l3_v4_filters;
> +
> + if (!(hw_atl2_rxf_l3_is_equal(&l3,
> + &l3_filters[old_l3_index]))) {
> + hw_atl2_rxf_l3_put(self,
> + &l3_filters[old_l3_index],
> + old_l3_index);
> + }
> + }
> + if (l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN)
> + priv->l3l4_filters[data->location].ipv6 = 1;
> + else
> + priv->l3l4_filters[data->location].ipv6 = 0;
> + priv->l3l4_filters[data->location].l3_index = l3_idx;
> +
> + if (l4.cmd & HW_ATL2_RPF_L4_CMD_EN) {
> + l4_idx = hw_atl2_new_fl3l4_find_l4(self, &l4);
> + if (l4_idx < 0)
> + return -EINVAL;
>
> [High, Low]
> By the time this L4 search runs, hw_atl2_new_fl3l4_configure() has
> already mutated state for the new request:
>
> - hw_atl2_rxf_l3_get() programmed the new L3 hardware filter and
> bumped its refcount,
> - the old L3 filter has typically been released via
> hw_atl2_rxf_l3_put() (clearing its HW slot if it was the last
> user, since match criteria differ), and
> - priv->l3l4_filters[location].l3_index and .ipv6 have been
> overwritten to point at the new L3 slot.
>
> When this branch returns, hw_atl2_hw_fl3l4_set() simply propagates
> the error and skips the ART write, so the device is left with the new
> L3 filter enabled (refcount 1) but no ART entry referencing its tag,
> the old ART entry still pointing at the now-cleared old L3 tag, and
> the user's previously-installed working rule effectively destroyed.
>
> Userspace receives an error and will not call the disable path, so
> the new L3 slot stays leaked. With only HW_ATL2_RPF_L3L4_FILTERS
> slots per family, repeated failures rapidly exhaust the pool.
>
> Could this path roll back the L3-side state (and refcounts) before
> returning?
>
> As a separate question, the L3 path uses:
>
> if (l3_idx < 0)
> return l3_idx;
>
> while the L4 path overwrites the more specific value with -EINVAL:
>
> if (l4_idx < 0)
> return -EINVAL;
>
> Should the L4 path also propagate l4_idx so userspace can distinguish
> -ENOSPC from -EINVAL?

Both observations are valid, but both are pre-existing in the vendor reference
driver which has the identical no-rollback pattern and return -EINVAL on the L4
error path.

For the rollback issue: adding proper rollback would require undoing hw_atl2_rxf_l3_get(),
restoring priv->l3l4_filters[location], and potentially re-programming the old L3 HW slot - a
non-trivial change that goes beyond this series. We will note it as a known limitation for
a follow-up.

For the -EINVAL vs propagating l4_idx: agreed, return l4_idx is more informative.
Will fix this since it is a trivial one-line change.

> +
> + if (priv->l3l4_filters[data->location].l4_index != l4_idx)
> + hw_atl2_rxf_l4_get(self, &priv->l4_filters[l4_idx],
> + l4_idx, &l4);
> + }
>
> [ ... ]
>
> +static int hw_atl2_hw_fl2_set(struct aq_hw_s *self,
> + struct aq_rx_filter_l2 *data)
> +{
>
> [ ... ]
>
> + tag = hw_atl2_filter_tag_get(priv->etype_policy,
> + priv->etype_filter_tag_top,
> + action);
> +
> + if (tag < 0)
> + return -ENOSPC;
> +
> + req_tag |= tag << HW_ATL2_RPF_TAG_ET_OFFSET;
> + hw_atl2_rpf_etht_flr_tag_set(self, tag, location);
> + index = priv->art_base_index + HW_ATL2_RPF_ET_PCP_USER_INDEX +
> + data->location;
> + hw_atl2_act_rslvr_table_set(self, index, req_tag, mask, action);
> +
> + hw_atl_rpf_etht_flr_en_set(self, 1U, location);
> +
> + return aq_hw_err_from_flags(self);
> +}
>
> [Medium]
> Once hw_atl2_filter_tag_get() succeeds, the etype_policy slot's usage
> count has been incremented. After that point this function still has
> two ways to return an error without releasing that tag:
>
> - hw_atl2_act_rslvr_table_set() may return an error (semaphore poll
> timeout) — its return value is ignored here, but
> - aq_hw_err_from_flags(self) at the end can return -ENXIO or -EIO if
> AQ_HW_FLAG_ERR_UNPLUG / AQ_HW_FLAG_ERR_HW have been set during the
> register writes above.
>
> Because HW_ATL2_RPF_ETYPE_TAGS is small (7), repeated transient HW
> Errors would gradually permanently exhaust the etype tag pool until
> no L2 filter can be installed. Should a failure path here call
> hw_atl2_filter_tag_put() before returning?

The tag leak on the aq_hw_err_from_flags() error path is a valid observation.
However the vendor reference driver has the identical pattern - no filter_tag_put()
on the error path. In practice aq_hw_err_from_flags() returns an error only when
AQ_HW_FLAG_ERR_UNPLUG or AQ_HW_FLAG_ERR_HW are set, which are fatal
hardware states where the device is already non-functional and filter management is moot.
Fixing this properly would require adding cleanup on the aq_hw_err_from_flags() path
across the entire driver, which is out of scope for this series. We will note it as a known limitation.