RE: [Intel-wired-lan] [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set
From: Loktionov, Aleksandr
Date: Thu Jun 18 2026 - 12:02:11 EST
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@xxxxxxxxxx> On Behalf
> Of Petr Oros
> Sent: Thursday, June 18, 2026 5:09 PM
> To: netdev@xxxxxxxxxxxxxxx
> Cc: Vecera, Ivan <ivecera@xxxxxxxxxx>; Alice Michael
> <alice.michael@xxxxxxxxx>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@xxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; Andrew Lunn <andrew+netdev@xxxxxxx>;
> Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>; Michal Swiatkowski
> <michal.swiatkowski@xxxxxxxxxxxxxxx>; Keller, Jacob E
> <jacob.e.keller@xxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo
> Abeni <pabeni@xxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>;
> intel-wired-lan@xxxxxxxxxxxxxxxx
> Subject: [Intel-wired-lan] [PATCH iwl-net 1/2] ice: skip per-VLAN
> promisc rules when default VSI Rx rule is set
>
> When an ice port is part of a vlan-filtering bridge with a wide VLAN
> trunk and the netdev is in IFF_PROMISC (typical for bond slaves
> attached to a bridge), the driver installs per-VLAN
> ICE_SW_LKUP_PROMISC_VLAN entries (recipe 9) in addition to the broad
> ICE_SW_LKUP_DFLT VSI Rx rule (recipe 5). Each per-VLAN rule consumes
> one Flow Lookup Unit (FLU) entry from a fixed hardware pool of "up to
> 32K FLU entries" per device, documented in the E810 datasheet
> (613875-009 section 7.8.10, Table 7-18, page 1015).
>
> With three active PFs sharing one switch context and a bridge trunk of
> vid 2-4094, the configuration would require roughly
>
> 3 PFs * 4093 VLANs * 3 rules per VLAN per PF ~= 36,800 rules
>
> which exceeds the 32K FLU budget. Firmware then responds to further
> Add Switch Rules requests with AQ retval 0x10 (LIBIE_AQ_RC_ENOSPC) and
> the user-visible failure surfaces as
>
> ice 0000:5c:00.1: Failed to set VSI 14 as the default forwarding
> VSI, error -5
> ice 0000:5c:00.1 ens1f1: Error -5 setting default VSI 14 Rx rule
>
> After a switch context has been driven into overrun, subsequent
> retries can come back as AQ retval 0x2 (LIBIE_AQ_RC_ENOENT), which has
> misled triage attempts toward a perceived recipe binding defect rather
> than a capacity issue.
>
> When the DFLT VSI Rx rule is in place it catches every packet on the
> lport regardless of VLAN tag, so the per-VLAN PROMISC_VLAN expansion
> is redundant. The recipe 4 VLAN prune entries are still installed per
> VLAN and continue to track the allowed VID set, but the IFF_PROMISC
> sync path disables their enforcement on the VSI via
> vlan_ops->dis_rx_filtering() before ice_set_promisc() runs.
> ena_rx_filtering() is restored when IFF_PROMISC is cleared.
>
> Skip the per-VLAN expansion at the two call sites that drive it:
> ice_set_promisc() falls through to ice_fltr_set_vsi_promisc() and
> ice_vlan_rx_add_vid() omits the per-VLAN ICE_MCAST_VLAN_PROMISC_BITS
> add. Plain IFF_ALLMULTI without an installed DFLT VSI rule is
> unchanged and still installs per-VLAN multicast promisc rules.
>
> Both checks use ice_is_vsi_dflt_vsi() which inspects the recipe filter
> list for an installed DFLT rule on this VSI, not
> netdev->flags & IFF_PROMISC. The HW-state predicate avoids two
> regression vectors that a user-intent predicate would introduce:
>
> 1. ice_lag_is_switchdev_running() short-circuits ice_set_dflt_vsi()
> to return 0 without installing the DFLT rule for a PF in
> switchdev LAG mode. An IFF_PROMISC-only check would also
> suppress the per-VLAN fallback, leaving the PF with no rule.
>
> 2. When ice_set_dflt_vsi() returns a non-EEXIST error (FLU
> exhausted, switch context divergence), the driver clears
> IFF_PROMISC from vsi->current_netdev_flags but the netdev's own
> flags retain IFF_PROMISC. The user-intent predicate would still
> suppress the per-VLAN fallback even though DFLT failed to
> install.
>
> The predicate is install-time only. The IFF_PROMISC off path closes
> the lifecycle gap in ice_vsi_exit_dflt_promisc(): for an IFF_ALLMULTI
> VSI with VLANs it reinstates the per-VID rules before clearing the
> default rule, so multicast coverage never lapses. If that AQ call
> fails the default rule is left in place, ice_vsi_exit_dflt_promisc()
> returns the error, and the sync_fltr pass bails with
> vsi->current_netdev_flags |= IFF_PROMISC; the current/netdev flag
> mismatch re-fires the IFF_PROMISC off path on the next sync. Clearing
> the default rule first would instead expose a window where neither the
> default rule nor the per-VID rules carry multicast.
>
> If ice_clear_dflt_vsi() fails after the per-VID rules were reinstated
> they are deliberately not rolled back. Clearing the default rule is a
> removal that frees an FLU entry rather than allocating one, so it
> cannot fail for lack of space; a failure is a transient AdminQ error.
> The per-VID rules are the steady state for an IFF_ALLMULTI VLAN VSI,
> so the only redundant entry left behind is the single un-removed
> default rule, not the per-VID set. The retry re-enters this path,
> ice_fltr_set_vlan_vsi_promisc() returns -EEXIST for the rules that
> already exist so nothing is reallocated, and the default rule is
> removed on the next attempt. Rolling the per-VID rules back here would
> instead churn thousands of removes and re-adds on every retry.
>
> After the default rule is gone the vid=0 PROMISC rule that paired with
> it is redundant and is dropped, but only to reclaim a filter entry, so
> a failure there is logged and does not abort the transition.
>
> ice_set_vsi_promisc() and ice_clear_vsi_promisc() dispatch the recipe
> based on whether ICE_PROMISC_VLAN_RX/TX bits are present in the mask:
> with the bits set, recipe ICE_SW_LKUP_PROMISC_VLAN is used; otherwise
> ICE_SW_LKUP_PROMISC. The else branch in
> ice_set_promisc() installs the vid=0 rule in ICE_SW_LKUP_PROMISC.
> Because ice_clear_promisc() with VLANs present adds the VLAN bits and
> would search ICE_SW_LKUP_PROMISC_VLAN, the recipe mismatch would leave
> the vid=0 ICE_SW_LKUP_PROMISC rule orphaned when VLANs are configured.
> This is a single stale rule, not a per-cycle leak:
> re-adding it on the next promisc on returns -EEXIST rather than
> allocating a new entry. The set-time recipe is not recorded, so
> ice_clear_promisc() clears both recipes; clearing a rule that is not
> present succeeds, both clears run unconditionally, and the first error
> is returned.
>
> The two VLAN-0 recipe transition blocks in ice_vlan_rx_add_vid() and
> ice_vlan_rx_kill_vid() that promote / demote the vid=0 rule between
> ICE_SW_LKUP_PROMISC and ICE_SW_LKUP_PROMISC_VLAN are likewise guarded
> by !ice_is_vsi_dflt_vsi(). With DFLT in place the
> vid=0 rule already covers every VID and a recipe swap would only
> install a redundant rule.
>
> Lab reproduction on an E810-C with the same firmware family (4.80, NVM
> 1.3805.0, DDP 1.3.43.0) using four PFs in vlan-filtering bridges with
> vid 2-4094 and the slaves brought to IFF_PROMISC before the bridge
> VLAN bulk add:
>
> before fix: ~12,279 AQ Add Switch Rules per PF, ENOSPC and ENOENT
> responses in dmesg, DFLT VSI Rx rule install fails on
> the affected PF
> after fix: ~4,093 AQ Add Switch Rules per PF, no AQ errors, DFLT
> VSI Rx rule installs on every PF
>
> The 66.7% reduction in installed switch rules per PF matches the
> expected per-VLAN saving: a single DFLT rule replaces the per-VID
> PROMISC_VLAN expansion.
>
> Functional regression test with vid 2-100 trunk between two ice ports
> through the lab switch (40/40 PASS, 0 AQ errors, 0 ENOSPC at 4093-VID
> customer scale):
>
> vid 50 unicast, vid 100 unicast, vid 50 broadcast ARP,
> vid 100 multicast IPv6 ND
> vid 200/500/1500/4000 isolation (out-of-trunk) and untagged not
> leaked: 0 packets reach any bridge endpoint
> IGMP/MLD snooping, Jumbo MTU 9000, reserved-multicast STP BPDU
> IFF_PROMISC + IFF_ALLMULTI transition (off while allmulti stays)
> Regression reproducer for commit 1273f89578f2 ("ice: Fix broken
> IFF_ALLMULTI handling"): allmulti on -> add vid -> allmulti off
> -> allmulti on plus the orphan-rule Scenario 2; both converge
> with no stale rules
> 100-VID, 1000-VID, 4093-VID stress cycles (5/3/2 iterations each)
> switchdev mode toggle preserves IFF_PROMISC pruning state across
> the session (vid 999 multicast received before and after the
> legacy -> switchdev -> legacy cycle)
> SR-IOV: VFs unaffected because ice_set_promisc() early-returns
> for non-PF VSI and VF representors do not register
> ndo_vlan_rx_add_vid
>
> Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
> Signed-off-by: Petr Oros <poros@xxxxxxxxxx>
> ---
> drivers/net/ethernet/intel/ice/ice_main.c | 90 ++++++++++++++++++----
> -
> 1 file changed, 70 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> b/drivers/net/ethernet/intel/ice/ice_main.c
> index 6d24056c247cf4..af8df81fc45623 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -274,7 +274,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8
> promisc_m)
> if (vsi->type != ICE_VSI_PF)
> return 0;
>
> - if (ice_vsi_has_non_zero_vlans(vsi)) {
> + /* skip per-VID expansion; the DFLT Rx rule already covers
> every VID */
> + if (ice_vsi_has_non_zero_vlans(vsi) &&
> !ice_is_vsi_dflt_vsi(vsi)) {
> promisc_m |= (ICE_PROMISC_VLAN_RX |
> ICE_PROMISC_VLAN_TX);
> status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw,
> vsi,
> promisc_m);
> @@ -304,9 +305,19 @@ static int ice_clear_promisc(struct ice_vsi *vsi,
> u8 promisc_m)
> return 0;
>
> if (ice_vsi_has_non_zero_vlans(vsi)) {
...
> ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi-
> >idx,
>
> ICE_MCAST_VLAN_PROMISC_BITS,
> 0);
> --
> 2.53.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@xxxxxxxxx>