Re: frequent lockups in 3.18rc4
From: Linus Torvalds
Date: Fri Nov 21 2014 - 13:22:14 EST
On Fri, Nov 21, 2014 at 9:22 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> Both mystify me. Why does the 32-bit version walk down the hierarchy
> at all instead of just touching the top level?
Quite frankly, I think it's just due to historical reasons, and should
be removed.
But the historical reasons are that with the aliasing of the PUD and
PMD entries in the PGD, it's all fairly confusing. So I think we only
used to do the top level, but then when we expanded from two levels to
three, that "top level" became the pmd, and then when we expanded from
three to four, the pmd was actually two levels down. So it's all
basically mindless work.
So I do think we could simplify and unify things.
In 32-bit mode, we actually have two different cases:
- in PAE, there's the magic top-level 4-entry PGD that always *has*
to be present (the P bit isn't actually checked by hardware)
As a result, in PAE mode, the top PGD entries always exist, and
are always prepopulated, and for the kernel area (including obviously
the vmalloc space) always points to the init_pgd[] entry.
Ergo, in PAE mode, I don't think we should ever hit this case in
the first place.
- in non-PAE mode, we should just copy the top-level entry, and return.
And in 64-bit more, we only have the "copy the top-level entry" case.
So I think we should
(a) remove the 32-bit vs 64-bit difference, because that's not actually valid
(b) make it a PAE vs non-PAE difference
(c) the PAE case is a no-op
(d) the non-PAE case would look something like this:
static noinline int vmalloc_fault(unsigned long address)
{
unsigned index;
pgd_t *pgd_dst, pgd_entry;
/* Make sure we are in vmalloc area: */
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;
index = pgd_index(address);
pgd_entry = init_mm.pgd[index];
if (!pgd_present(pgd_entry))
return -1;
pgd_dst = __va(PAGE_MASK & read_cr3());
if (pgd_present(pgd_dst[index]))
return -1;
ACCESS_ONCE(pgd_dst[index]) = pgd_entry;
return 0;
}
NOKPROBE_SYMBOL(vmalloc_fault);
and it's done.
Would anybody be willing to actually *test* something like the above?
The above may compile, but that's all the "testing" it got.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/