Re: [PATCH v5] af_packet: Handle outgoing VLAN packets without hardware offloading

From: Willem de Bruijn
Date: Wed Jun 05 2024 - 15:24:53 EST


Chengen Du wrote:
> Hi Willem,
>
> My apologies, but I still have some questions I would like to discuss with you.
>
> On Wed, Jun 5, 2024 at 6:57 AM Willem de Bruijn
> <willemdebruijn.kernel@xxxxxxxxx> wrote:
> >
> > Chengen Du wrote:
> > > The issue initially stems from libpcap. The ethertype will be overwritten
> > > as the VLAN TPID if the network interface lacks hardware VLAN offloading.
> > > In the outbound packet path, if hardware VLAN offloading is unavailable,
> > > the VLAN tag is inserted into the payload but then cleared from the sk_buff
> > > struct. Consequently, this can lead to a false negative when checking for
> > > the presence of a VLAN tag, causing the packet sniffing outcome to lack
> > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing
> > > tool may be unable to parse packets as expected.
> > >
> > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not
> > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the
> > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function
> > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header
> > > is stripped, preventing the packet capturing tool from determining the
> > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect,
> > > which means the packet capturing tool cannot parse the L3 header correctly.
> > >
> > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105
> > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@xxxxxxxxxxxxx/T/#u
> > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: Chengen Du <chengen.du@xxxxxxxxxxxxx>
> > > ---
> > > net/packet/af_packet.c | 64 ++++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 62 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > index ea3ebc160e25..53d51ac87ac6 100644
> > > --- a/net/packet/af_packet.c
> > > +++ b/net/packet/af_packet.c
> > > @@ -538,6 +538,52 @@ static void *packet_current_frame(struct packet_sock *po,
> > > return packet_lookup_frame(po, rb, rb->head, status);
> > > }
> > >
> > > +static u16 vlan_get_tci(struct sk_buff *skb)
> > > +{
> > > + unsigned int vlan_depth = skb->mac_len;
> > > + struct vlan_hdr vhdr, *vh;
> > > + u8 *skb_head = skb->data;
> > > + int skb_len = skb->len;
> > > +
> > > + if (vlan_depth) {
> > > + if (WARN_ON(vlan_depth < VLAN_HLEN))
> > > + return 0;
> > > + vlan_depth -= VLAN_HLEN;
> > > + } else {
> > > + vlan_depth = ETH_HLEN;
> > > + }
> > > +
> > > + skb_push(skb, skb->data - skb_mac_header(skb));
> > > + vh = skb_header_pointer(skb, vlan_depth, sizeof(vhdr), &vhdr);
> > > + if (skb_head != skb->data) {
> > > + skb->data = skb_head;
> > > + skb->len = skb_len;
> > > + }
> > > + if (unlikely(!vh))
> > > + return 0;
> > > +
> > > + return ntohs(vh->h_vlan_TCI);
> >
> > As stated in the conversation: no need for the vlan_depth code.
> >
> > skb->data is either at the link layer header or immediately beyond it
> > (i.e., at the outer vlan tag).
>
> I'm confused about this part and feel there may be some
> misunderstanding on my end. From what I understand, skb->data will be
> at different positions depending on the scenario. For example, in
> tpacket_rcv(), in SOCK_RAW, it will be at the link layer header, but
> for SOCK_DGRAM with PACKET_OUTGOING, it will be at the network layer
> header.

Right, but that is a binary option. Either at L2 or right after.

skb_header_pointer(skb, skb_mac_offset(skb) - ETH_VLEN, ...) will do.
>
> Given this situation, it seems necessary to adjust skb->data to point
> to the link layer header and then seek the VLAN tag based on the MAC
> length, rather than parsing directly from the skb->data head. Could
> you please clarify this in more detail?
>
> >
> > > +}
> > > +
> > > +static __be16 vlan_get_protocol_dgram(struct sk_buff *skb)
> > > +{
> > > + __be16 proto = skb->protocol;
> > > +
> > > + if (unlikely(eth_type_vlan(proto))) {
> > > + u8 *skb_head = skb->data;
> >
> > Since skb->head is a different thing from skb->data, please call this
> > orig_data or so.
> > > + int skb_len = skb->len;
> > > +
> > > + skb_push(skb, skb->data - skb_mac_header(skb));
> > > + proto = __vlan_get_protocol(skb, proto, NULL);
> > > + if (skb_head != skb->data) {
> > > + skb->data = skb_head;
> > > + skb->len = skb_len;
> > > + }
> > > + }
> >
> > I don't see a way around this temporary skb->data mangling, so +1
> > unless someone else suggests a cleaner way.
> >
> > On second thought, one option:
> >
> > This adds some parsing overhead in the datapath. SOCK_RAW does not
> > need it, as it can see the whole VLAN tag. Perhaps limit the new
> > branches to SOCK_DGRAM cases? Then the above can also be simplified.
>
> I considered this approach before, but it would result in different
> metadata for SOCK_DGRAM and SOCK_RAW scenarios. This difference makes
> me hesitate because it might be better to provide consistent metadata
> to describe the same packet, regardless of the receiver's approach.
> These are just my thoughts and I'm open to further discussion.

See Alexandre's response and my follow-up.