pmd_modify() semantics

From: Vineet Gupta
Date: Tue Oct 13 2015 - 09:58:47 EST


Hi Kirill,

I'm running LTP tests on the new ARC THP code and thp03 seems to be triggering mm
spew.

--------------->8---------------------
[ARCLinux]# ./ltp-thp03-extract
PID 60
bad pmd bf1c4600 be600231
../mm/pgtable-generic.c:34: bad pgd be600231.
bad pmd bf1c4604 bd800231
../mm/pgtable-generic.c:34: bad pgd bd800231.
BUG: Bad rss-counter state mm:bf12e900 idx:1 val:512
BUG: non-zero nr_ptes on freeing mm: 2
--------------->8---------------------

I know what exactly is happening and the likely fix, but would want to get some
thoughts from you if possible.

background: ARC is software page walked with PGD -> PTE -> page for normal and PMD
-> page for THP case. A vanilla PGD doesn't have any flags - only pointer to PTE

A reduced version of thp03 allocates a THP, dirties it, followed by
mprotect(PROT_NONE).
At the time of mprotect() -> change_huge_pmd() -> pmd_modify() needs to change
some of the bits.

The issue is ARC implementation of pmd_modify() based on pte variant, which
retains the soft pte bits (dirty and accessed).

static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{
return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
}

Obvious fix is to rewrite pmd_modify() so that it clears out all pte type flags
but that assumes PMD is becoming PGD (a vanilla PGD on ARC doesn't have any
flags). Can we have pmd_modify() ever be called for NOT splitting pmd e.g.
mprotect Write to Read which won't split the THP like it does now and simply
changes the prot flags. My proposed version of pmd_modify() will loose the dirty bit.

In short, what are the semantics of pmd_modify() - essentially does it imply pmd
is being split so are free to make it like PGD.

TIA,
-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/