Re: e1000e NVM corruption issue status

From: Jesse Barnes
Date: Fri Sep 26 2008 - 14:24:44 EST


On Friday, September 26, 2008 10:52 am Jesse Barnes wrote:
> On Friday, September 26, 2008 4:49 am Arjan van de Ven wrote:
> > Jiri Kosina wrote:
> > > On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
> > >> this is the current set of patches that I have to help us debug
> > >> and/or fix e1000e issues found during this debug effort for
> > >> the corrupt NVM. the "drop stats lock" - "reset swflag" patches allow
> > >> Thomas' patch for a mutex in the SWFLAG acquire function to run
> > >> without any errors.
> > >
> > > Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you
> > > please add it to that lineup?
> > >
> > > http://marc.info/?l=linux-kernel&m=122237193628087&w=2
> >
> > can we (for now) also stick a WARN_ON() into that failure path? that way
> > we can at least catch if/when this happens more visibly..... if it
> > happens consistently in say the new distros we can be more confident that
> > we're down the right path in diagnosing the issue.
>
> I'm spinning a new one now with some debug output, stay tuned (just gotta
> boot my test box).

Ok here's an updated one. Jesse (Br) can you add it to your list? If the X
driver really is mapping too much this should catch it, as long as it goes
through sysfs.

Thanks,
Jesse
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c71858..11523a3 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -16,6 +16,7 @@


#include <linux/kernel.h>
+#include <linux/sched.h>
#include <linux/pci.h>
#include <linux/stat.h>
#include <linux/topology.h>
@@ -502,6 +503,8 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
struct resource *res = (struct resource *)attr->private;
enum pci_mmap_state mmap_type;
resource_size_t start, end;
+ unsigned long map_len = vma->vm_end - vma->vm_start;
+ unsigned long map_offset = vma->vm_pgoff << PAGE_SHIFT;
int i;

for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -510,6 +513,18 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
if (i >= PCI_ROM_RESOURCE)
return -ENODEV;

+ /*
+ * Make sure the range the user is trying to map falls within
+ * the resource
+ */
+ if (map_offset + map_len > pci_resource_len(pdev, i)) {
+ printk(KERN_ERR "process \"%s\" tried to map 0x%08lx-0x%08lx on BAR %d (size 0x%08lx)\n",
+ current->comm, map_offset, map_offset + map_len, i,
+ (unsigned long)pci_resource_len(pdev, i));
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
/* pci_mmap_page_range() expects the same kind of entry as coming
* from /proc/bus/pci/ which is a "user visible" value. If this is
* different from the resource itself, arch will do necessary fixup.