Re: frequent lockups in 3.18rc4

From: Linus Torvalds
Date: Fri Nov 21 2014 - 14:06:47 EST


On Fri, Nov 21, 2014 at 10:22 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> (d) the non-PAE case would look something like this:
>
> static noinline int vmalloc_fault(unsigned long address)
> {
> unsigned index;
> pgd_t *pgd_dst, pgd_entry;
>
> /* Make sure we are in vmalloc area: */
> if (!(address >= VMALLOC_START && address < VMALLOC_END))
> return -1;

Side note: I think this is just unnecessary confusion, and generates
big constants for no good reason.

The thing is, the kernel PGD's should always be in sync. In fact, at
PGD allocation time, we just do

clone_pgd_range(.. KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS);

and it might actually be better to structure this to be that exact same thing.

So instead of checking the address, we could just do

index = pgd_index(address);
if (index < KERNEL_PGD_BOUNDARY)
return -1;

which actually matches our initialization sequence much better anyway.
And avoids those random big constants.

Also, it turns out that this:

if (pgd_present(pgd_dst[index]))

generates a crazy big constant because of bad compiler issues (the
"pgd_present()" thing only checks the low bit, but it does so on
pgd_flags(), which does "native_pgd_val(pgd) & PTE_FLAGS_MASK", so you
have an insane extra "and" with the constant 0xffffc00000000fff, just
to then "and" it again with "1". It doesn't do that with the first
pgd_present() check, oddly enough.

WTF, gcc?

Anyway, even more importantly, because of the whole issue with nesting
page tables, it's probably best to actually avoid all the
"pgd_present()" etc helpers, because those might be hardcoded to 1
etc. So avoid the whole issue by just accessign the raw data.

Simplify, simplify, simplify. The actual code generation for this all
should be maybe 20 instructions.

Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
compiled it and verified that the code generation looks like what I'd
have expected, but that's literally it.

static noinline int vmalloc_fault(unsigned long address)
{
pgd_t *pgd_dst;
pgdval_t pgd_entry;
unsigned index = pgd_index(address);

if (index < KERNEL_PGD_BOUNDARY)
return -1;

pgd_entry = init_mm.pgd[index].pgd;
if (!pgd_entry)
return -1;

pgd_dst = __va(PAGE_MASK & read_cr3());
pgd_dst += index;

if (pgd_dst->pgd)
return -1;

ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;
return 0;
}
NOKPROBE_SYMBOL(vmalloc_fault);

Hmm? Does anybody see anything fundamentally wrong with this?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/