Re: [bug] e100 bug: checksum mismatch on 82551ER rev10

From: Auke Kok
Date: Mon Jul 10 2006 - 13:19:41 EST


Molle Bestefich wrote:
Auke Kok wrote:
If you have received a motherboard or card with a broken EEPROM then your card
is in a limbo state - it might work but results are unreliable and may cause
your entire system to break (and even data corruption).

You should contact the hardware vendor and have the board replaced or upgraded
with a proper EEPROM. Continuing to work with the corrupted EEPROM image that
you have now can seriously hurt you later on.

Every single IP130 I've had my hands on has had an EEPROM that the
Linux driver declared bad.

that means that whoever is selling you the IP130's is consistently putting on bad EEPROMs, which is *very* bad. Which vendor is that? They can fix this problem for you and for *everyone* else they have sold and will sell IP130's to in the future.

I'm afraid that it's not the board that's at fault, it's the driver.

No it is not. The NIC is supported (you can even call Intel for first line support) but if your vendor put a bad EEPROM image on it then all bets are off. Intel provides the vendors with the proper tools to make valid EEPROMs, the driver checks them for a very good reason.

The NICs are working perfectly.

How can you tell? Do you know if jumbo frames work correctly? Is the device properly checksumming? is flow control working properly? These and many, many more settings are determined by the EEPROM. Seemingly it may work correctly, but there is no guarantee whatsoever that it will work correctly at all if the checksum is bad. Again, you can lose data, or worse, you could corrupt memory in the system causing massive failure (DMA timings, etc). Unlikely? sure, but not impossible.

(Also, it seems mighty odd to refuse to drive the hardware based on an
EEPROM checksum failure, when the e100 driver will happily load for a
device where for example IRQ routing is broken. Just another
indication that erroring out in this situation is overkill.)

That is another discussion. All wifi drivers bail out if the firmware is corrupted, why shouldn't e1000 be allowed to do so either? Are you willing to risk your data?

Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/