Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets
From: Alexander Duyck
Date: Tue May 21 2019 - 12:54:22 EST
On Tue, May 21, 2019 at 8:15 AM Lennart Sorensen
<lsorense@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, May 17, 2019 at 03:20:02PM -0700, Alexander Duyck wrote:
> > I was hoping it would work too. It seemed like it should have been the
> > answer since it definitely didn't seem right. Now it has me wondering
> > about some of the other code in the driver.
> >
> > By any chance have you run anything like DPDK on any of the X722
> > interfaces on this system recently? I ask because it occurs to me that
> > if you had and it loaded something like a custom parsing profile it
> > could cause issues similar to this.
>
> I have never used DPDK on anything. I was hoping never to do so. :)
>
> This system has so far booted Debian (with a 4.19 kernel) and our own OS
> (which has a 4.9 kernel).
>
> > A debugging step you might try would be to revert back to my earlier
> > patch that only displayed the input mask instead of changing it. Once
> > you have done that you could look at doing a full power cycle on the
> > system by either physically disconnecting the power, or using the
> > power switch on the power supply itself if one is available. It is
> > necessary to disconnect the motherboard/NIC from power in order to
> > fully clear the global state stored in the device as it is retained
> > when the system is in standby.
> >
> > What I want to verify is if the input mask that we have ran into is
> > the natural power-on input mask of if that is something that was
> > overridden by something else. The mask change I made should be reset
> > if the system loses power, and then it will either default back to the
> > value with the 6's if that is it's natural state, or it will match
> > what I had if it was not.
> >
> > Other than that I really can't think up too much else. I suppose there
> > is the possibility of the NVM either setting up a DCB setting or
> > HREGION register causing an override that is limiting the queues to 1.
> > However, the likelihood of that should be really low.
>
> Here is the register dump after a full power off:
>
> 40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.7-k
> i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> i40e 0000:3d:00.0: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e 0000:3d:00.0: The driver for the device detected a newer version of the NVM image than expected. Please install the most recent version of the network driver.
> i40e 0000:3d:00.0: MAC address: a4:bf:01:4e:0c:87
> i40e 0000:3d:00.0: flow_type: 63 input_mask:0x0000000000004000
> i40e 0000:3d:00.0: flow_type: 46 input_mask:0x0007fff800000000
> i40e 0000:3d:00.0: flow_type: 45 input_mask:0x0007fff800000000
> i40e 0000:3d:00.0: flow_type: 44 input_mask:0x0007ffff80000000
> i40e 0000:3d:00.0: flow_type: 43 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 42 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 41 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 40 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 39 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 36 input_mask:0x0006060000000000
> i40e 0000:3d:00.0: flow_type: 35 input_mask:0x0006060000000000
> i40e 0000:3d:00.0: flow_type: 34 input_mask:0x0006060780000000
> i40e 0000:3d:00.0: flow_type: 33 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 32 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 31 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 30 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 29 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: Features: PF-id[0] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN Geneve VEPA
> i40e 0000:3d:00.1: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e 0000:3d:00.1: The driver for the device detected a newer version of the NVM image than expected. Please install the most recent version of the network driver.
> i40e 0000:3d:00.1: MAC address: a4:bf:01:4e:0c:88
> i40e 0000:3d:00.1: flow_type: 63 input_mask:0x0000000000004000
> i40e 0000:3d:00.1: flow_type: 46 input_mask:0x0007fff800000000
> i40e 0000:3d:00.1: flow_type: 45 input_mask:0x0007fff800000000
> i40e 0000:3d:00.1: flow_type: 44 input_mask:0x0007ffff80000000
> i40e 0000:3d:00.1: flow_type: 43 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.1: flow_type: 42 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.1: flow_type: 41 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.1: flow_type: 40 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.1: flow_type: 39 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.1: flow_type: 36 input_mask:0x0006060000000000
> i40e 0000:3d:00.1: flow_type: 35 input_mask:0x0006060000000000
> i40e 0000:3d:00.1: flow_type: 34 input_mask:0x0006060780000000
> i40e 0000:3d:00.1: flow_type: 33 input_mask:0x0006060600000000
> i40e 0000:3d:00.1: flow_type: 32 input_mask:0x0006060600000000
> i40e 0000:3d:00.1: flow_type: 31 input_mask:0x0006060600000000
> i40e 0000:3d:00.1: flow_type: 30 input_mask:0x0006060600000000
> i40e 0000:3d:00.1: flow_type: 29 input_mask:0x0006060600000000
> i40e 0000:3d:00.1: Features: PF-id[1] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN Geneve VEPA
> i40e 0000:3d:00.1 eth2: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
>
> Pretty sure that is identical to before.
>
> If I dump the registers after doing the update I see this (just did a
> reboot this time, not a power cycle):
>
> i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.7-k
> i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> i40e 0000:3d:00.0: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e 0000:3d:00.0: The driver for the device detected a newer version of the NVM image than expected. Please install the most recent version of the network driver.
> i40e 0000:3d:00.0: MAC address: a4:bf:01:4e:0c:87
> i40e 0000:3d:00.0: flow_type: 63 input_mask:0x0000000000004000
> i40e 0000:3d:00.0: flow_type: 46 input_mask:0x0007fff800000000
> i40e 0000:3d:00.0: flow_type: 45 input_mask:0x0007fff800000000
> i40e 0000:3d:00.0: flow_type: 44 input_mask:0x0007ffff80000000
> i40e 0000:3d:00.0: flow_type: 43 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 42 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 41 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 40 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 39 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type: 36 input_mask:0x0006060000000000
> i40e 0000:3d:00.0: flow_type: 35 input_mask:0x0006060000000000
> i40e 0000:3d:00.0: flow_type: 34 input_mask:0x0006060780000000
> i40e 0000:3d:00.0: flow_type: 33 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 32 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 31 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 30 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow_type: 29 input_mask:0x0006060600000000
> i40e 0000:3d:00.0: flow type: 36 update input mask from:0x0006060000000000, to:0x0001801800000000
> i40e 0000:3d:00.0: flow type: 35 update input mask from:0x0006060000000000, to:0x0001801800000000
> i40e 0000:3d:00.0: flow type: 34 update input mask from:0x0006060780000000, to:0x0001801f80000000
> i40e 0000:3d:00.0: flow type: 33 update input mask from:0x0006060600000000, to:0x0001801e00000000
> i40e 0000:3d:00.0: flow type: 32 update input mask from:0x0006060600000000, to:0x0001801e00000000
> i40e 0000:3d:00.0: flow type: 31 update input mask from:0x0006060600000000, to:0x0001801e00000000
> i40e 0000:3d:00.0: flow type: 30 update input mask from:0x0006060600000000, to:0x0001801e00000000
> i40e 0000:3d:00.0: flow type: 29 update input mask from:0x0006060600000000, to:0x0001801e00000000
> i40e 0000:3d:00.0: flow_type after update: 63 input_mask:0x0000000000004000
> i40e 0000:3d:00.0: flow_type after update: 46 input_mask:0x0007fff800000000
> i40e 0000:3d:00.0: flow_type after update: 45 input_mask:0x0007fff800000000
> i40e 0000:3d:00.0: flow_type after update: 44 input_mask:0x0007ffff80000000
> i40e 0000:3d:00.0: flow_type after update: 43 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type after update: 42 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type after update: 41 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type after update: 40 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type after update: 39 input_mask:0x0007fffe00000000
> i40e 0000:3d:00.0: flow_type after update: 36 input_mask:0x0001801800000000
> i40e 0000:3d:00.0: flow_type after update: 35 input_mask:0x0001801800000000
> i40e 0000:3d:00.0: flow_type after update: 34 input_mask:0x0001801f80000000
> i40e 0000:3d:00.0: flow_type after update: 33 input_mask:0x0001801e00000000
> i40e 0000:3d:00.0: flow_type after update: 32 input_mask:0x0001801e00000000
> i40e 0000:3d:00.0: flow_type after update: 31 input_mask:0x0001801e00000000
> i40e 0000:3d:00.0: flow_type after update: 30 input_mask:0x0001801e00000000
> i40e 0000:3d:00.0: flow_type after update: 29 input_mask:0x0001801e00000000
> i40e 0000:3d:00.0: Features: PF-id[0] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN Geneve VEPA
>
> So at least it appears the update did apply.
>
> --
> Len Sorensen
I think we need to narrow this down a bit more. Let's try forcing the
lookup table all to one value and see if traffic is still going to
queue 0.
Specifically what we need to is run the following command to try and
force all RSS traffic to queue 8, you can verify the result with
"ethtool -x":
ethtool -X <iface> weight 0 0 0 0 0 0 0 0 1
If that works and the IPSec traffic goes to queue 8 then we are likely
looking at some sort of input issue, either in the parsing or the
population of things like the input mask that we can then debug
further.
If traffic still goes to queue 0 then that tells us the output of the
RSS hash and lookup table are being ignored, this would imply either
some other filter is rerouting the traffic or is directing us to limit
the queue index to 0 bits.
Thanks.
- Alex