Re: 3.0.1: pagevec_lookup+0x1d/0x30, SLAB issues?

From: Justin Piszcz
Date: Wed Sep 14 2011 - 06:53:22 EST




On Wed, 14 Sep 2011, Eric Dumazet wrote:

Le mercredi 14 septembre 2011 à 05:47 -0400, Justin Piszcz a écrit :

On Wed, 14 Sep 2011, Lin Ming wrote:

On Mon, Sep 12, 2011 at 6:44 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
Hi, Justin

There is a similar bug report at:
http://marc.info/?t=131594190600005&r=1&w=2

The attached patch from Shaohua fixed the bug.

Could you have a try it?


Hi Lin/LKML,

Can you please provide text patch files for what you want me to apply?
I did read that e-mail thread and that could be the culprit, I will patch
and apply as soon as someone points to to the patch locations :)

diff --git a/mm/filemap.c b/mm/filemap.c
index 645a080..7771871 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
{
unsigned int i;
unsigned int ret;
- unsigned int nr_found;
+ unsigned int nr_found, nr_skip;

rcu_read_lock();
restart:
nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
+ nr_skip = 0;
for (i = 0; i < nr_found; i++) {
struct page *page;
repeat:
@@ -856,6 +857,7 @@ repeat:
* here as an exceptional entry: so skip over it -
* we only reach this from invalidate_mapping_pages().
*/
+ nr_skip++;
continue;
}

@@ -876,7 +878,7 @@ repeat:
* If all entries were removed before we could secure them,
* try again, because callers stop trying once 0 is returned.
*/
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found > nr_skip))
goto restart;
rcu_read_unlock();
return ret;



Hi,

Thanks-- patch applied and booted to the new kernel, so just so everyone knows, I am running three patches atop of 3.1-rc4:

(for the igb problem/memory allocation issue)
0001-Fix-pointer-dereference-before-call-to-pcie_bus_conf.patch
0002-PCI-Remove-MRRS-modification-from-MPS-setting-code.patch

(for the RCU/memory errors)
0003-filemap.patch

I will try to perform more load tests on the NIC and CPU/MEM/I+O and see
if any of my problems return!

Justin.