On Thu, 2009-06-18 at 15:04 -0400, Lee Schermerhorn wrote:On Thu, 2009-06-18 at 00:37 -0400, Lee Schermerhorn wrote:On Wed, 2009-06-17 at 09:45 +0200, Stefan Lankes wrote:I have updated the migrate-on-fault tarball in the above location to fixStefan:I've placed the last rebased version in :OK! I will try to reconstruct the problem.
http://free.linux.hp.com/~lts/Patches/PageMigration/2.6.28-rc4-mmotm-
081110/
Today I rebased the migrate on fault patches to 2.6.30-mmotm-090612...
[along with my shared policy series atop which they sit in my tree].
Patches reside in:
http://free.linux.hp.com/~lts/Patches/PageMigration/2.6.30-mmotm-090612-1220/
part of the problems I was seeing. See below.
I did a quick test. I'm afraid the patches have suffered some "bit rot"<snip>
vis a vis mainline/mmotm over the past several months. Two possibly
related issues:
1) lazy migration doesn't seem to work. Looks like
mbind(<some-policy>+MPOL_MF_MOVE+MPOL_MF_LAZY) is not unmapping the
pages so, of course, migrate on fault won't work. I suspect the
reference count handling has changed since I last tried this. [Note one
of the patch conflicts was in the MPOL_MF_LAZY addition to the mbind
flag definitions in mempolicy.h and I may have botched the resolution
thereof.]
2) When the pages get freed on exit/unmap, they are still PageLocked()
and free_pages_check()/bad_page() bugs out with bad page state.
Note: This is independent of memcg--i.e., happens whether or not memcg
configured.
OK. Found time to look at this. Turns out I hadn't tested since
trylock_page() was introduced. I did a one-for-one replacement of the
old API [TestSetPageLocked()], not noticing that the sense of the return
was inverted. Thus, I was bailing out of the migrate_pages_unmap_only()
loop with the page locked, thinking someone else had locked it and would
take care of it. Since the page wasn't unmapped from the page table[s],
of course it wouldn't migrate on fault--wouldn't even fault!
Fixed this.
Now: lazy migration works w/ or w/o memcg configured, but NOT with the
swap resource controller configured. I'll look at that as time permits.
Update: I now can't reproduce the lazy migration failure with the swap
resource controller configured. Perhaps I had booted the wrong kernel
for the test reported above. Now the updated patch series mentioned
above seems to be working with both memory and swap resource controllers
configured for simple memtoy driven lazy migration.