Re: [PATCH RFC net-next 05/19] net: dsa: tag_ar9331: add GRO callbacks

From: Alexander Lobakin
Date: Mon Jan 13 2020 - 04:21:37 EST


Florian Fainelli wrote 30.12.2019 21:20:
On 12/30/19 6:30 AM, Alexander Lobakin wrote:
Add GRO callbacks to the AR9331 tagger so GRO layer can now process
such frames.

Signed-off-by: Alexander Lobakin <alobakin@xxxxxxxx>

This is a good example and we should probably build a tagger abstraction
that is much simpler to fill in callbacks for (although indirect
function calls may end-up killing performance with retpoline and
friends), but let's consider this idea.

Hey al,
Sorry for late replies, was in a big trip.

The performance issue was the main reason why I chose to write full
.gro_receive() for every single tagger instead of providing a bunch
of abstraction callbacks. It really isn't a problem for MIPS, on
which I'm working on this stuff, but can kill any advantages that we
could get from GRO support on e.g. x86.

---
net/dsa/tag_ar9331.c | 77 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)

diff --git a/net/dsa/tag_ar9331.c b/net/dsa/tag_ar9331.c
index c22c1b515e02..99cc7fd92d8e 100644
--- a/net/dsa/tag_ar9331.c
+++ b/net/dsa/tag_ar9331.c
@@ -100,12 +100,89 @@ static void ar9331_tag_flow_dissect(const struct sk_buff *skb, __be16 *proto,
*proto = ar9331_tag_encap_proto(skb->data);
}

+static struct sk_buff *ar9331_tag_gro_receive(struct list_head *head,
+ struct sk_buff *skb)
+{
+ const struct packet_offload *ptype;
+ struct sk_buff *p, *pp = NULL;
+ u32 data_off, data_end;
+ const u8 *data;
+ int flush = 1;
+
+ data_off = skb_gro_offset(skb);
+ data_end = data_off + AR9331_HDR_LEN;

AR9331_HDR_LEN is a parameter here which is incidentally
dsa_device_ops::overhead.

Or we can split .overhead to .rx_len and .tx_len and use the first
to help GRO layer and flow dissector and the second to determine
total overhead to correct MTU value. Smth like:

mtu = max(tag_ops->rx_len, tag_ops->tx_len);

+
+ data = skb_gro_header_fast(skb, data_off);
+ if (skb_gro_header_hard(skb, data_end)) {
+ data = skb_gro_header_slow(skb, data_end, data_off);
+ if (unlikely(!data))
+ goto out;
+ }
+
+ /* Data that is to the left from the current position is already
+ * pulled to the head
+ */
+ if (unlikely(!ar9331_tag_sanity_check(skb->data + data_off)))
+ goto out;

This is applicable to all taggers, they need to verify the sanity of the
header they are being handed.

+
+ rcu_read_lock();
+
+ ptype = gro_find_receive_by_type(ar9331_tag_encap_proto(data));

If there is no encapsulation a tagger can return the frame's protocol
directly, so similarly the tagger can be interrogated for returning that.

+ if (!ptype)
+ goto out_unlock;
+
+ flush = 0;
+
+ list_for_each_entry(p, head, list) {
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ if (ar9331_tag_source_port(skb->data + data_off) ^
+ ar9331_tag_source_port(p->data + data_off))

Similarly here, the tagger could provide a function whose job is to
return the port number from within its own tag.

So with that being said, what do you think about building a tagger
abstraction which is comprised of:

- header length which is dsa_device_ops::overhead
- validate_tag()
- get_tag_encap_proto()
- get_port_number()

and the rest is just wrapping the general GRO list manipulation?

get_tag_encap_proto() and get_port_number() would be called more
than once in that case for every single frame. Not sure if it is
a good idea regarding to mentioned retpoline issues.

Also, I am wondering should we somehow expose the DSA master
net_device's napi_struct such that we could have the DSA slave
net_devices call napi_gro_receive() themselves directly such that they
could also perform additional GRO on top of Ethernet frames?

There's no reason to pass frames to GRO layer more than once.

The most correct way to handle frames is to pass them to networking
stack only after DSA tag extraction and removal. That's kinda how
mac80211 infra works. But this is rather problematic for DSA as it
keeps Ethernet controller drivers and taggers completely independent
from each others.

I also had an idea to use net_device::rx_handler for tag processing
instead of dsa_pack_type. CPU ports can't be bridged anyway, so this
should not be a problem an the first look.

Regards,
á á á á á á