Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

From: Jarod Wilson
Date: Tue Oct 06 2015 - 17:50:38 EST


Alexander Duyck wrote:
On 09/21/2015 09:14 PM, Jarod Wilson wrote:
Alexander Duyck wrote:
On 09/21/2015 10:11 AM, Jarod Wilson wrote:
Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues
such as
this, where the disconnect happened just before
igb_configure_tx_ring():

...
Note: this is a follow-up patch in addition to the previously submitted
"igb: don't unmap NULL hw_addr"

drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
b/drivers/net/ethernet/intel/igb/igb_main.c
index 6369f9e..7060edf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -952,6 +952,11 @@ static int igb_request_msix(struct igb_adapter
*adapter)
if (err)
goto err_out;

+ if (E1000_REMOVED(hw->hw_addr)) {
+ err = -EIO;
+ goto err_free;
+ }
+
for (i = 0; i < adapter->num_q_vectors; i++) {
struct igb_q_vector *q_vector = adapter->q_vector[i];



Instead of using E1000_REMOVED we should just replace the
adapter->hw.hw_addr in the setup of the itr_register with
adapter->io_addr like you did for Rx/Tx below.

I just tried that, and it reliably blows up horrifically, wedging the
machine to the point where all I could get was a screen shot with my
phone thus far, when we jump from igb_request_msix() to
igb_configure_msix() to igb_assign_vector() and finally to
igb_write_ivar(), at least as best as I can tell from what I was able to
see in the trace remnants still on-screen.

Take a look at array_rd32, that is buggy and doesn't match up with the
rd32 implementation. If you fix that then the blow-up should go away.

You shouldn't need to worry about itr_register since it will only get
triggered if the interrupt can be enabled and that shouldn't be able to
happen if the device is not present.
...
Just switching to adapter->io_addr everywhere seems to not work as noted
above. :\ Note that I'm also chasing this from the other end with the
author of the pci patches that seem to have triggered this, so the real
bug might be over in pci-land, but hardening against explosions in igb
still seems like a worthwhile effort here.

I am pretty sure array_rd32 is the problem. If you fix that then I
suspect you should quit seeing any further issues.

Okay, I've reworked array_rd32 a bit to call igb_rd32 as well, and with that and some other minor tweaks, nothing blowing up. The hardware still doesn't work when hotplugged, due to disappearing off the pci bus temporarily, but I think that's a pci hotplug issue, not igb's fault. Things seem to behave just fine if the device is connected at boot. Updated patch forthcoming for dissection...

--
Jarod Wilson
jarod@xxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/