Re: Workaround for Intel MPS errata

From: Jon Mason
Date: Tue Oct 04 2011 - 23:46:52 EST

Hey Avi,
I believe this will fix the issue (assuming the errata is the issue in
the first place). You'll need to apply the patch on top of Linus'
latest code and re-enable the MPS tuning (as it is now off by
default). This can be done by adding "pci=pcie_bus_safe" to your boot

After thinking about it some more, a PCI quark is the correct way of
doing things. We must always disable read completion coalescing due
to the possibility of hotplugging a device with a MPS of 256B. Also,
I believe everyone will think this is much cleaner.

Let me know how it goes and thanks again for testing this for me.


commit ada901c643c7d0978a762fbadc2a6526e4ddf860
Author: Jon Mason <mason@xxxxxxxx>
Date: Thu Sep 29 16:56:37 2011 -0500

PCI: Workaround for Intel MPS errata

Intel 5000 and 5100 series memory controllers have a known issue if read
completion coalescing is enabled and the PCI-E Maximum Payload Size is
set to 256B. To work around this issue, disable read completion
coalescing in the memory controller and root complexes. Unfortunately,
it must always be disabled, even if no 256B MPS devices are precent, due
to the possiblity of one being hotplugged.

Links to erratas:

Thanks to Jesse Brandeburg and Ben Hutchings for providing insight into
the problem.

Tested-and-Reported-by: Avi Kivity <avi@xxxxxxxxxx>
Signed-off-by: Jon Mason <mason@xxxxxxxx>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6ab6bd3..c223d95 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1452,7 +1452,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
return 0;

-/* pcie_bus_configure_mps requires that pci_walk_bus work in a top-down,
+/* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
* parents then children fashion. If this changes, then this code will not
* work as designed.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1196f61..70da3f5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2822,6 +2822,75 @@ static void __devinit fixup_ti816x_class(struct pci_dev* dev)
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_TI, 0xb800, fixup_ti816x_class);

+/* Intel 5000 and 5100 Memory controllers have an errata with read completion
+ * coalescing (which is enabled by default on some BIOSes) and MPS of 256B.
+ * Since there is no way of knowing what the PCIE MPS on each fabric will be
+ * until all of the devices are discovered and buses walked, read completion
+ * coalescing must be disabled. Unfortunately, it cannot be re-enabled because
+ * it is possible to hotplug a device with MPS of 256B.
+ */
+static void __devinit quirk_intel_mc_errata(struct pci_dev *dev)
+ int err;
+ u16 rcc;
+ if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
+ return;
+ /* Intel errata specifies bits to change but does not say what they are.
+ * Keeping them magical until such time as the registers and values can
+ * be explained.
+ */
+ err = pci_read_config_word(dev, 0x48, &rcc);
+ if (err) {
+ dev_err(&dev->dev, "Error attempting to read the read "
+ "completion coalescing register.\n");
+ return;
+ }
+ if (!(rcc & (1 << 10)))
+ return;
+ rcc &= ~(1 << 10);
+ err = pci_write_config_word(dev, 0x48, rcc);
+ if (err) {
+ dev_err(&dev->dev, "Error attempting to write the read "
+ "completion coalescing register.\n");
+ return;
+ }
+ pr_info_once("Read completion coalescing disabled due to hardware "
+ "errata relating to 256B MPS.\n");
+/* Intel 5000 series memory controllers and ports 2-7 */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25c0, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25d0, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25d4, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25d8, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e2, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e3, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e4, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e5, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e6, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e7, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25f7, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25f8, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25f9, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25fa, quirk_intel_mc_errata);
+/* Intel 5100 series memory controllers and ports 2-7 */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65c0, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e2, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e3, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e4, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e5, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e6, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e7, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f7, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f8, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f9, quirk_intel_mc_errata);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65fa, quirk_intel_mc_errata);
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at