Re: [PATCH] ARM: mm: report both sections from PMD

From: Kees Cook
Date: Tue Feb 11 2014 - 13:22:50 EST


On Tue, Feb 11, 2014 at 3:17 AM, Catalin Marinas
<catalin.marinas@xxxxxxx> wrote:
> On Mon, Feb 10, 2014 at 05:26:28PM +0000, Kees Cook wrote:
>> On Mon, Feb 10, 2014 at 2:41 AM, Catalin Marinas
>> <catalin.marinas@xxxxxxx> wrote:
>> > On Mon, Feb 10, 2014 at 10:29:35AM +0000, Catalin Marinas wrote:
>> >> On Sun, Feb 09, 2014 at 10:18:26PM +0000, Kees Cook wrote:
>> >> > diff --git a/arch/arm/mm/dump.c b/arch/arm/mm/dump.c
>> >> > index 1f7b1e13d945..ff1559f9200c 100644
>> >> > --- a/arch/arm/mm/dump.c
>> >> > +++ b/arch/arm/mm/dump.c
>> >> > @@ -264,6 +264,9 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, unsigned long start)
>> >> > note_page(st, addr, 3, pmd_val(*pmd));
>> >> > else
>> >> > walk_pte(st, pmd, addr);
>> >> > +
>> >> > + if (SECTION_SIZE < PMD_SIZE && pmd_sect(*pmd))
>> >> > + note_page(st, addr + SECTION_SIZE, 3, pmd_val(pmd[1]));
>> >>
>> >> You can use pmd_large() here as well.
>> >>
>> >> But I think this function is broken (the "for" statement not shown
>> >> here). The pmd_t is 32-bit with classic MMU and it uses pmd++ while the
>> >> address grows by PMD_SIZE (two pmd_t entries).
>> >
>> > Actually it's ok since PTRS_PER_PMD is 1, so it only goes through this
>> > loop once.
>> >
>> > But in your patch shouldn't you check for pmd_large(*(pmd+1))? The first
>> > pmd is already caught by the 'if' statement.
>>
>> It wasn't clear to me what the logic should be here. If PTRS_PER_PMD
>> is 1, then why is there this second pmd after the first? Shouldn't
>> PTRS_PER_PMD be 2 if that's the case?
>
> The reason is that a hardware pte has only 256 entries (classic MMU),
> this is 1KB. We put two pte tables together and it gives us 2KB. The
> other 2KB in the page are used for Linux pte bits. Because we have two
> hw pte tables in a page, we need two corresponding pmd entries.
>
> A side effect is that even though we don't actually have a pmd (normally
> we should have included pgtable-nopmd.h), we still pretend we have one
> and __pmd_populate takes care of writing two consecutive entries. If we
> set PTRS_PER_PMD to 2, we should modify pte_alloc_one() to allocate a
> single hw pte (1KB + 1KB for software bits). I don't think this would be
> more efficient (there may have been other kernel restrictions in the
> past to require a full pte table page).
>
>> If that's not the case, then I figured the state of needing to report
>> the 2nd pmd depended on the type of the first one, so that's what I
>> wrote instead of trying to figure out why PTRS_PER_PMD wasn't 2.
>
> I don't remember whether we can have the first pmd being a table and the
> second one being a section. I don't think we restrict this but Russell
> should know more.

It sounds like my logic is still okay, then? Perhaps move it into the
first "if" for readability?

if (pmd_none(*pmd) || pmd_large(*pmd) || !pmd_present(*pmd)) {
note_page(st, addr, 3, pmd_val(*pmd));
if (SECTION_SIZE < PMD_SIZE && pmd_sect(*pmd))
note_page(st, addr + SECTION_SIZE, 3,
pmd_val(pmd[1]));
} else
walk_pte(st, pmd, addr);

Or should be be explicitly separated (to allow for the very unlikely
future case of pmd_large != pmd_sect)? In the LPAE case, SECTION_SIZE
== PMD_SIZE, so IIUC, we have to continue testing for that:

if (pmd_sect(*pmd)) {
note_page(st, addr, 3, pmd_val(*pmd));
if (SECTION_SIZE < PMD_SIZE)
note_page(st, addr + SECTION_SIZE, 3,
pmd_val(pmd[1]));
} else if (pmd_none(*pmd) || pmd_large(*pmd) ||
!pmd_present(*pmd))
note_page(st, addr, 3, pmd_val(*pmd));
else
walk_pte(st, pmd, addr);

>> There's clearly something I'm not understanding in here. :)
>
> I happen to understand it from time to time but it doesn't last ;).

Heh, understood. Thanks for looking at this. :)

-Kees

--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/