Disabling interrupt remapping seems to cause 50% drop in ethernet speed (v3.10)

From: Linda Walsh
Date: Sun Jul 07 2013 - 17:58:42 EST


There seems to be a new check :


Comments

Neil Horman <mailto:nhorman@xxxxxxxxxxxxx> - April 15, 2013, 4:28 p.m.

A few years back intel published a spec update:
http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf

For the 5520 and 5500 chipsets which contained an errata (specificially errata
53), which noted that these chipsets can't properly do interrupt remapping, and
as a result the recommend that interrupt remapping be disabled in bios. While
many vendors have a bios update to do exactly that, not all do, and of course
not all users update their bios to a level that corrects the problem. As a
result, occasionally interrupts can arrive at a cpu even after affinity for that
interrupt has be moved, leading to lost or spurrious interrupts (usually
characterized by the message:
kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)

There have been several incidents recently of people seeing this error, and
investigation has shown that they have system for which their BIOS level is such
that this feature was not properly turned off. As such, it would be good to
give them a reminder that their systems are vulnurable to this problem. For
details of those that reported the problem, please see:
https://bugzilla.redhat.com/show_bug.cgi?id=887006

Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
CC: Prarit Bhargava <prarit@xxxxxxxxxx>
CC: Don Zickus <dzickus@xxxxxxxxxx>
CC: Don Dutile <ddutile@xxxxxxxxxx>
CC: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
CC: Asit Mallick <asit.k.mallick@xxxxxxxxx>
CC: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
CC: linux-pci@xxxxxxxxxxxxxxx
CC: Joerg Roedel <joro@xxxxxxxxxx>
CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
====================

That causes a >=50% drop in receive performance on
ethernet file transfers (with the linux machine being
receiving a file)... Sending doesn't appear to be affected.

Is the above error message "No irq handler for vector" the only
error message I would see if I suffered from this bug?

I looked through message logs going back to 2012-01-27 and found 0 of those messages. I do have the part that that is claimed to be affected.

I've been using interrupt affinity /steering (not irqbalancing) to put
ethernet interrupts for this interface on a specific cpu, keeping
the file server for that interface on the same cpu as well as keeping
other HW interrupts off of that node.


Without the remapping, I am finding 50% or greater drop in receive speed,
yet with the remapping, I am not finding the error indicated above.

It is possible I don't see the interrupt because I don't dynamically
changed affinity after it is initialized -- dunno. According to the
report this shouldn't be the case. If the above error message is the
symptom, I'd think I'd see it in 2 years of logs.

Is there a way to disable this short of reverting the patch?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/