Re: [PATCH] flow_dissector: Fix vlan header offset in __skb_flow_dissect

From: Stanislav Fomichev
Date: Thu Jun 20 2019 - 20:33:25 EST


On 06/20, Yuehaibing wrote:
> On 2019/6/20 2:39, Stanislav Fomichev wrote:
> > On 06/20, YueHaibing wrote:
> >> We build vlan on top of bonding interface, which vlan offload
> >> is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is
> >> BOND_XMIT_POLICY_ENCAP34.
> >>
> >> __skb_flow_dissect() fails to get information from protocol headers
> >> encapsulated within vlan, because 'nhoff' is points to IP header,
> >> so bond hashing is based on layer 2 info, which fails to distribute
> >> packets across slaves.
> >>
> >> Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
> >> Signed-off-by: YueHaibing <yuehaibing@xxxxxxxxxx>
> >> ---
> >> net/core/flow_dissector.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> >> index 415b95f..2a52abb 100644
> >> --- a/net/core/flow_dissector.c
> >> +++ b/net/core/flow_dissector.c
> >> @@ -785,6 +785,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> >> skb && skb_vlan_tag_present(skb)) {
> >> proto = skb->protocol;
> >> } else {
> >> + if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX)
> >> + nhoff -= sizeof(*vlan);
> >> +
> > Should we instead fix the place where the skb is allocated to properly
> > pull vlan (skb_vlan_untag)? I'm not sure this particular place is
> > supposed to work with an skb. Having an skb with nhoff pointing to
> > IP header but missing skb_vlan_tag_present() when with
> > proto==ETH_P_8021xx seems weird.
>
> The skb is a forwarded vxlan packet, it send through vlan interface like this:
>
> vlan_dev_hard_start_xmit
> --> __vlan_hwaccel_put_tag //vlan_tci and VLAN_TAG_PRESENT is set
> --> dev_queue_xmit
> --> validate_xmit_skb
> --> validate_xmit_vlan // vlan_hw_offload_capable is false
> --> __vlan_hwaccel_push_inside //here skb_push vlan_hlen, then clear skb->tci
>
> --> bond_start_xmit
> --> bond_xmit_hash
> --> __skb_flow_dissect // nhoff point to IP header
> --> case htons(ETH_P_8021Q)
> // skb_vlan_tag_present is false, so
> vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), //vlan point to ip header wrongly
I see, so bonding device propagates hw VLAN support from the slaves.
If one of the slaves doesn't have it, its disabled for the bond device.
Any idea why we do that? Why not pass skbs to the slave devices
instead and let them handle the hw/sw vlan implementation?
I see the propagation was added in 278339a42a1b 10 years ago and
I don't see any rationale in the commit description.
Somebody with more context should probably chime in :-)