On 25/11/2019 09:09, Nicholas Johnson wrote:
The default value of /proc/sys/net/core/gro_normal_batch was 8.That sounds like the GRO batch isn't getting flushed at the endof the
Setting it to 1 allowed it to connect to Wi-Fi network.
Setting it back to 8 did not kill the connection.
But when I disconnected and tried to reconnect, it did not re-connect.
Hence, it appears that the problem only affects the initial handshake
when associating with a network, and not normal packet flow.
ÂNAPI â maybe the driver isn't calling napi_complete_done() at the
Âappropriate time?
Indeed, from digging through the layers of iwlwifi I eventually get to
Âiwl_pcie_rx_handle() which doesn't really have a NAPI poll (the
Ânapi->poll function is iwl_pcie_dummy_napi_poll() { WARN_ON(1);
Âreturn 0; }) and instead calls napi_gro_flush() at the end of its RX
Âhandling. Unfortunately, napi_gro_flush() is no longer enough,
Âbecause it doesn't call gro_normal_list() so the packets on the
ÂGRO_NORMAL list just sit there indefinitely.
It was seeing drivers calling napi_gro_flush() directly that had me
Âworried in the first place about whether listifying napi_gro_receive()
Âwas safe and where the gro_normal_list() should go.
I wondered if other drivers that show up in [1] needed fixing with a
Âgro_normal_list() next to their napi_gro_flush() call. From a cursory
Âcheck:
brocade/bna: has a real poller, calls napi_complete_done() so is OK.
cortina/gemini: calls napi_complete_done() straight after
Ânapi_gro_flush(), so is OK.
hisilicon/hns3: calls napi_complete(), so is _probably_ OK.
But it's far from clear to me why *any* of those drivers are calling
Ânapi_gro_flush() themselves...
-Ed
[1]: https://elixir.bootlin.com/linux/latest/ident/napi_gro_flush