Re: [PATCH v3] af_packet: Handle outgoing VLAN packets without hardware offloading
From: Willem de Bruijn
Date: Mon May 27 2024 - 11:05:47 EST
Include target: net vs net-next
[PATCH net v3]
Chengen Du wrote:
> The issue initially stems from libpcap [1]. In the outbound packet path,
> if hardware VLAN offloading is unavailable, the VLAN tag is inserted into
> the payload but then cleared from the sk_buff struct. Consequently, this
> can lead to a false negative when checking for the presence of a VLAN tag,
> causing the packet sniffing outcome to lack VLAN tag information (i.e.,
> TCI-TPID). As a result, the packet capturing tool may be unable to parse
> packets as expected.
>
> The TCI-TPID is missing because the prb_fill_vlan_info() function does not
> modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the
> payload and not in the sk_buff struct. The skb_vlan_tag_present() function
> only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header
> is stripped, preventing the packet capturing tool from determining the
> correct TCI-TPID value. Additionally, the protocol in SLL is incorrect,
> which means the packet capturing tool cannot parse the L3 header correctly.
>
This does not add much context over v1 of the patch. But at least a
pointer to context.
> [1] https://github.com/the-tcpdump-group/libpcap/issues/1105
Prefer Link: $URL
Please also add a Link to the conversation on patch 1:
Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@xxxxxxxxxxxxx/T/#u
> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
The referenced commit only introduces v3. The code changes to
tpacket_rcv and packet_recvmsg indicate that this goes back further.
Let's say to the introduction of explicitly passing VLAN information:
Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Chengen Du <chengen.du@xxxxxxxxxxxxx>
> ---
> net/packet/af_packet.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index ea3ebc160e25..82b36e90d73b 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1011,6 +1011,10 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
> ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb);
> ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto);
> ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(pkc->skb->protocol)) {
> + ppd->hv1.tp_vlan_tci = ntohs(vlan_eth_hdr(pkc->skb)->h_vlan_TCI);
Careful about packet length. A malicious packet can be inserted that
is an Ethernet header with zero payload, but ETH_P_8021Q as h_proto.
See how __vlan_get_protocol carefully reads the headers.
> + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol);
> + ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> } else {
> ppd->hv1.tp_vlan_tci = 0;
> ppd->hv1.tp_vlan_tpid = 0;
> @@ -2428,6 +2432,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> h.h2->tp_vlan_tci = skb_vlan_tag_get(skb);
> h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
> status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + h.h2->tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + h.h2->tp_vlan_tpid = ntohs(skb->protocol);
> + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> } else {
> h.h2->tp_vlan_tci = 0;
> h.h2->tp_vlan_tpid = 0;
> @@ -2457,7 +2465,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> sll->sll_family = AF_PACKET;
> sll->sll_hatype = dev->type;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
In SOCK_RAW mode, the VLAN tag will be present, so should be returned.
I'm concerned about returning a different value between SOCK_RAW and
SOCK_DGRAM. But don't immediately see a better option. And for
SOCK_DGRAM this approach is indistinguishable from the result on a
device with hardware offload, so is acceptable.
This test for ETH_P_8021Q ignores the QinQ stacked VLAN case. When
fixing VLAN encap, both variants should be addressed at the same time.
Note that ETH_P_8021AD is included in the eth_type_vlan test you call
above.
All these extra branches also makes the common case slower. Let's try
to mitigate that as much as possible.
> sll->sll_pkttype = skb->pkt_type;
> if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV)))
> sll->sll_ifindex = orig_dev->ifindex;
> @@ -3482,7 +3491,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> /* Original length was stored in sockaddr_ll fields */
> origlen = PACKET_SKB_CB(skb)->sa.origlen;
> sll->sll_family = AF_PACKET;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> }
>
> sock_recv_cmsgs(msg, sk, skb);
> @@ -3539,6 +3549,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> aux.tp_vlan_tci = skb_vlan_tag_get(skb);
> aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
> aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + aux.tp_vlan_tpid = ntohs(skb->protocol);
> + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> } else {
> aux.tp_vlan_tci = 0;
> aux.tp_vlan_tpid = 0;
> --
> 2.40.1
>