ASPLOV miss ratio porting to planet labs kernel

From: Sizhao Yang
Date: Thu Jul 07 2005 - 12:44:54 EST


Hi all,

I was wondering if someone could help me with this. I'm porting an
ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to
Planet Labs kernel. It's a novel idea for memory management. In
porting I at run time I'm consistently hitting kernel bugs at four
different places bad_page, bad_range, in rmap.c
BUG(page_mapcount(page)< 0), and failing at apm_do_idle. All of these
functions except apm_do_idle seem to be new functions from 2.4.20 to
2.6.11.6. I'm pretty sure I'm forgetting to account for certain
things when modifying the pages, but I'm not sure where.

What I'm doing in the port is resetting protection bits so that when
it page faults. It will calculate a miss ratio based on the number of
accessed bits and other information. After I gather the information I
will reset the accessed bits. Then based on previous miss ratios and
current miss ratio it will give out memory to different processes
based on that. That's the general idea. For more specifics:

http://carmen.cs.uiuc.edu/paper/ASPLOS04-Zhou.pdf

I've narrowed it down to primarily when I call the following functions:
ptep_test_and_clear_young,
static inline pte_t pte_mknominor(pte_t pte) { (pte).pte_low &=
~_PAGE_PROTNONE; return pte; }
static inline pte_t pte_mkminor(pte_t pte) { (pte).pte_low |=
_PAGE_PROTNONE; return pte; }
static inline pte_t pte_mkpresent(pte_t pte) { (pte).pte_low |=
_PAGE_PRESENT; return pte; }
static inline pte_t pte_mkabsent(pte_t pte) { (pte).pte_low &=
~_PAGE_PRESENT; return pte; }

When I don't have those functions in my code the kernel doesn't crash,
but when I do they crash. So, my question is am I to page accounting
aspects? I looked at rmap functions for incrementing the _mapcount but
they seem to be only for when a pte is copied. Should I be
incrementing the pagecount at any point? rmap.c
BUG(page_mapcount(page)< 0) is invoked when the accessed bits are
cleared in zap_pte, but I don't know how the page is being corrupted.
So, my main issue is if anyone can help point me to the direction
where a page could be corrupted I would greatly appreciate.

I'd appreciate any general direction anyone can point me at. Thanks
in advance. I look forward to hearing from anyone.

Zao
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/