Re: testing pmdval/pteval page presence bit

From: Pekka Paalanen
Date: Wed Feb 11 2009 - 13:08:52 EST


(Oliver and David, I added you to CC, since I recall you were planning
for user space tracing in mmiotrace.)

On Tue, 10 Feb 2009 14:42:56 -0800
Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:

> Pekka Paalanen wrote:
> > Hi all,
> >
> > This question is related to mmiotrace which toggles the page presence
> > bit to trigger page faults on ioremapped regions. Page faults are used
> > to trace MMIO reads and writes of proprietary drivers.
> >
> > I understood that large pages use pmd's instead of pte's. If there is a
> > union like this:
> >
> > + union {
> > + pmdval_t pmdval;
> > + pteval_t pteval;
> > + } saved; /* stored value prior to arming */
> >
> > and it is being assigned the proper content, as in the following:
> >
> > +static int clear_page_present(struct kmmio_fault_page *f, bool clear)
> > {
> > pteval_t pteval;
> > pmdval_t pmdval;
> > unsigned int level;
> > pmd_t *pmd;
> > + pte_t *pte = lookup_address(f->page, &level);
> >
> > if (!pte) {
> > + pr_err("kmmio: no pte for page 0x%08lx\n", f->page);
> > return -1;
> > }
> >
> > switch (level) {
> > case PG_LEVEL_2M:
> > pmd = (pmd_t *)pte;
> > + if (clear) {
> > + f->saved.pmdval = pmd_val(*pmd);
> > + pmdval = f->saved.pmdval & ~_PAGE_PRESENT;
> > + } else
> > + pmdval = f->saved.pmdval;
> > set_pmd(pmd, __pmd(pmdval));
> > break;
> >
> > case PG_LEVEL_4K:
> > + if (clear) {
> > + f->saved.pteval = pte_val(*pte);
> > + pteval = f->saved.pteval & ~_PAGE_PRESENT;
> > + } else
> > + pteval = f->saved.pteval;
> > set_pte_atomic(pte, __pte(pteval));
> > break;
> >
> >
> > Then regardless of was it pmdval or pteval being set, the test
> >
> > if (!(faultpage->saved.pteval & _PAGE_PRESENT))
> >
> > should be ok. But is it?
> > Can large page (pmd) presence be handled just like a normal page (pte)?
> >
>
> _PAGE_PRESENT is meaningful for both ptes and pmds; you can use
> pmd_present() to test for it rather than open-coding it.

Okay, will look into those, but it also means I need to record which
one I am dealing with.

> But there's one other theoretical problem with this code. In general it
> isn't safe to just toggle the _PAGE_PRESENT bit on its own, because the
> rest of the non-present pte could get interpreted as a swap entry. If
> you're guaranteed that these are kernel mappings then there's no problem
> in practice.

This is good to know. So far these are kernel mappings, as they are all
created by ioremap*(), but there are plans to extend mmiotrace to
trace IO-mappings accessed from user space. Do you have hints for that?

OTOH, we are always dealing with PCI IO-mem-mappings, so would those ever
be not present, excluding the mmiotrace case?

Well, Stuart already found out that the kernel ioremap*()'ed pages might
not really be present, there are some fixes coming up to mmiotrace to
cope with that. The plan is to restore the state of the pte like it was
before mmiotrace cleared the _PAGE_PRESENT flag, and if the same
instruction and address faults again, fall through to the normal page
fault handling. How can/will this fail? And if it is a user page?

Oh, we are on x86/x86_64 only.


Thanks.

--
Pekka Paalanen
http://www.iki.fi/pq/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/