Re: Top kernel oopses/warnings for the week of May 30th 2008

From: Hugh Dickins
Date: Mon Jun 02 2008 - 19:45:29 EST


On Fri, 30 May 2008, Hugh Dickins wrote:
> On Fri, 30 May 2008, Arjan van de Ven wrote:
> >
> > Rank 7: set_page_address (oops)
> > Reported 53 times (65 total reports)
> > crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
> > other day?
>
> No, not at all. But I'll have a little ponder over it.
>
> > This oops was last seen in version 2.6.25.3, and first seen in 2.6.25.
> > More info:
> > http://www.kerneloops.org/searchweek.php?search=set_page_address

Though I've spent quite a while poring over it, I regret to say I
haven't got much beyond the obvious with this BUG_ON(!PageHighMem)
in set_page_address() called from flush_all_zero_pkmaps().

It appears to be a corruption of the start of the pkmap_page_table,
but not a random corruption: entries of the form 0x378xxxxx through
0x37Bxxxxxx where they need to be 0x38xxxxxx or more to be highmem.
(I say appears because the compiler is reusing %eax a lot, there's
no trace on the stack or in registers of what pte was actually read.)

In every case except the 17141 nfsd one, it's found at the start of
the table, when flush_all_zero_pkmaps() is called for the very first
time (I'm guessing that from the fact that they're all failing on the
second entry, which preincrementation of the index made the first one
used). Whereas 17141 nfsd finds a 0x00000xxx some way into the page
table, quite possibly later on: may have a very different cause.

Do we have any idea whether all or most of these come from a single
machine? That would of course be a very different (less interesting)
story from if they're spread out over lots of machines.

I didn't notice anything suspicious in the Fedora patches to 2.6.25,
but I haven't heard (Google hasn't shown) any such problem outside
of these kerneloops from Fedora 9. Is it showing up on Rawhide at
all? If so, then we could devise some debug to include in coming
kernels to help shed more light on it.

Veering off at a tangent away from the oops: I was rather sobered
to see all those traces of execve using kmap, I thought we were
avoiding kmap like BKL in common paths these days (though it is
convenient for symlinks). Would a patch something like that
below, copying the filemap.c trick, be welcome?

Hugh

--- 2.6.26-rc4/fs/exec.c 2008-05-26 20:00:39.000000000 +0100
+++ linux/fs/exec.c 2008-06-02 11:18:32.000000000 +0100
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/init.h>
#include <linux/pagemap.h>
+#include <linux/hardirq.h>
#include <linux/highmem.h>
#include <linux/spinlock.h>
#include <linux/key.h>
@@ -396,7 +397,7 @@ static int copy_strings(int argc, char _
{
struct page *kmapped_page = NULL;
char *kaddr = NULL;
- unsigned long kpos = 0;
+ unsigned long kpos = ~PAGE_MASK;
int ret;

while (argc-- > 0) {
@@ -436,28 +437,38 @@ static int copy_strings(int argc, char _
str -= bytes_to_copy;
len -= bytes_to_copy;

- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
+ if (kpos != (pos & PAGE_MASK)) {
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
+ if (in_atomic())
+ kunmap_atomic(kaddr, KM_USER0);
+ else
+ kunmap(kmapped_page);
put_arg_page(kmapped_page);
}
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
+ kmapped_page = get_arg_page(bprm, pos, 1);
+ if (!kmapped_page) {
+ ret = -E2BIG;
+ goto out;
+ }
+ kaddr = kmap_atomic(kmapped_page, KM_USER0);
kpos = pos & PAGE_MASK;
flush_arg_page(bprm, kpos, kmapped_page);
}
- if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
+ if (in_atomic()) {
+ if (need_resched() ||
+ __copy_from_user_inatomic(kaddr + offset,
+ str, bytes_to_copy)) {
+ kunmap_atomic(kaddr, KM_USER0);
+ kaddr = kmap(kmapped_page);
+ }
+ }
+ if (!in_atomic()) {
+ if (copy_from_user(kaddr + offset,
+ str, bytes_to_copy)) {
+ ret = -EFAULT;
+ goto out;
+ }
}
}
}
@@ -465,7 +476,10 @@ static int copy_strings(int argc, char _
out:
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
+ if (in_atomic())
+ kunmap_atomic(kaddr, KM_USER0);
+ else
+ kunmap(kmapped_page);
put_arg_page(kmapped_page);
}
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/