Re: [RFC] respect the referenced bit of KVM guest pages?

From: Minchan Kim
Date: Wed Aug 19 2009 - 09:28:43 EST


On Wed, Aug 19, 2009 at 10:19 PM, KOSAKI
Motohiro<kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> 2009/8/19 Minchan Kim <minchan.kim@xxxxxxxxx>:
>> On Wed, Aug 19, 2009 at 9:10 PM, Wu Fengguang<fengguang.wu@xxxxxxxxx> wrote:
>>> On Wed, Aug 19, 2009 at 08:05:19PM +0800, KOSAKI Motohiro wrote:
>>>> >> page_referenced_file?
>>>> >> I think we should change page_referenced().
>>>> >
>>>> > Yeah, good catch.
>>>> >
>>>> >>
>>>> >> Instead, How about this?
>>>> >> ==============================================
>>>> >>
>>>> >> Subject: [PATCH] mm: stop circulating of referenced mlocked pages
>>>> >>
>>>> >> Currently, mlock() systemcall doesn't gurantee to mark the page PG_Mlocked
>>>> >
>>>> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âmark PG_mlocked
>>>> >
>>>> >> because some race prevent page grabbing.
>>>> >> In that case, instead vmscan move the page to unevictable lru.
>>>> >>
>>>> >> However, Recently Wu Fengguang pointed out current vmscan logic isn't so
>>>> >> efficient.
>>>> >> mlocked page can move circulatly active and inactive list because
>>>> >> vmscan check the page is referenced _before_ cull mlocked page.
>>>> >>
>>>> >> Plus, vmscan should mark PG_Mlocked when cull mlocked page.
>>>> >
>>>> > Â Â Â Â Â Â Â Â Â Â Â Â Â PG_mlocked
>>>> >
>>>> >> Otherwise vm stastics show strange number.
>>>> >>
>>>> >> This patch does that.
>>>> >
>>>> > Reviewed-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> >> Index: b/mm/rmap.c
>>>> >> ===================================================================
>>>> >> --- a/mm/rmap.c    2009-08-18 19:48:14.000000000 +0900
>>>> >> +++ b/mm/rmap.c    2009-08-18 23:47:34.000000000 +0900
>>>> >> @@ -362,7 +362,9 @@ static int page_referenced_one(struct pa
>>>> >> Â Â Â Â* unevictable list.
>>>> >> Â Â Â Â*/
>>>> >> Â Â Â if (vma->vm_flags & VM_LOCKED) {
>>>> >> - Â Â Â Â Â Â *mapcount = 1; Â/* break early from loop */
>>>> >> + Â Â Â Â Â Â *mapcount = 1; Â Â Â Â Â/* break early from loop */
>>>> >> + Â Â Â Â Â Â *vm_flags |= VM_LOCKED; /* for prevent to move active list */
>>>> >
>>>> >> + Â Â Â Â Â Â try_set_page_mlocked(vma, page);
>>>> >
>>>> > That call is not absolutely necessary?
>>>>
>>>> Why? I haven't catch your point.
>>>
>>> Because we'll eventually hit another try_set_page_mlocked() when
>>> trying to unmap the page. Ie. duplicated with another call you added
>>> in this patch.
>
> Correct.
>
>
>> Yes. we don't have to call it and we can make patch simple.
>> I already sent patch on yesterday.
>>
>> http://marc.info/?l=linux-mm&m=125059325722370&w=2
>>
>> I think It's more simple than KOSAKI's idea.
>> Is any problem in my patch ?
>
> Hmm, I think
>
> 1. Anyway, we need turn on PG_mlock.

I add my patch again to explain.

diff --git a/mm/rmap.c b/mm/rmap.c
index ed63894..283266c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -358,6 +358,7 @@ static int page_referenced_one(struct page *page,
*/
if (vma->vm_flags & VM_LOCKED) {
*mapcount = 1; /* break early from loop */
+ *vm_flags |= VM_LOCKED;
goto out_unmap;
}

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d224b28..d156e1d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -632,7 +632,8 @@ static unsigned long shrink_page_list(struct
list_head *page_list,
sc->mem_cgroup, &vm_flags);
/* In active use or really unfreeable? Activate it. */
if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
- referenced && page_mapping_inuse(page))
+ referenced && page_mapping_inuse(page)
+ && !(vm_flags & VM_LOCKED))
goto activate_locked;

By this check, the page can be reached at try_to_unmap after
page_referenced in shrink_page_list. At that time PG_mlocked will be
set.

> 2. PG_mlock prevent livelock because page_evictable() check is called
> at very early in shrink_page_list().






--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/