Re: [PATCH v3 1/2] mm/page_idle: Add per-pid idle page tracking using virtual indexing
From: Minchan Kim
Date: Mon Aug 05 2019 - 03:55:59 EST
Hi Joel,
On Wed, Jul 31, 2019 at 01:19:37PM -0400, Joel Fernandes wrote:
> > > -static struct page *page_idle_get_page(unsigned long pfn)
> > > +static struct page *page_idle_get_page(struct page *page_in)
> >
> > Looks weird function name after you changed the argument.
> > Maybe "bool check_valid_page(struct page *page)"?
>
>
> I don't think so, this function does a get_page_unless_zero() on the page as well.
>
> > > {
> > > struct page *page;
> > > pg_data_t *pgdat;
> > >
> > > - if (!pfn_valid(pfn))
> > > - return NULL;
> > > -
> > > - page = pfn_to_page(pfn);
> > > + page = page_in;
> > > if (!page || !PageLRU(page) ||
> > > !get_page_unless_zero(page))
> > > return NULL;
> > > @@ -51,6 +49,18 @@ static struct page *page_idle_get_page(unsigned long pfn)
> > > return page;
> > > }
> > >
> > > +/*
> > > + * This function tries to get a user memory page by pfn as described above.
> > > + */
> > > +static struct page *page_idle_get_page_pfn(unsigned long pfn)
> >
> > So we could use page_idle_get_page name here.
>
>
> Based on above comment, I prefer to keep same name. Do you agree?
Yes, I agree. Just please add a comment about refcount in the description
on page_idle_get_page.
>
>
> > > + return page_idle_get_page(pfn_to_page(pfn));
> > > +}
> > > +
> > > static bool page_idle_clear_pte_refs_one(struct page *page,
> > > struct vm_area_struct *vma,
> > > unsigned long addr, void *arg)
> > > @@ -118,6 +128,47 @@ static void page_idle_clear_pte_refs(struct page *page)
> > > unlock_page(page);
> > > }
> > >
> > > +/* Helper to get the start and end frame given a pos and count */
> > > +static int page_idle_get_frames(loff_t pos, size_t count, struct mm_struct *mm,
> > > + unsigned long *start, unsigned long *end)
> > > +{
> > > + unsigned long max_frame;
> > > +
> > > + /* If an mm is not given, assume we want physical frames */
> > > + max_frame = mm ? (mm->task_size >> PAGE_SHIFT) : max_pfn;
> > > +
> > > + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> > > + return -EINVAL;
> > > +
> > > + *start = pos * BITS_PER_BYTE;
> > > + if (*start >= max_frame)
> > > + return -ENXIO;
> > > +
> > > + *end = *start + count * BITS_PER_BYTE;
> > > + if (*end > max_frame)
> > > + *end = max_frame;
> > > + return 0;
> > > +}
> > > +
> > > +static bool page_really_idle(struct page *page)
> >
> > Just minor:
> > Instead of creating new API, could we combine page_is_idle with
> > introducing furthere argument pte_check?
>
>
> I cannot see in the code where pte_check will be false when this is called? I
> could rename the function to page_idle_check_ptes() if that's Ok with you.
What I don't like is _*really*_ part of the funcion name.
I see several page_is_idle calls in huge_memory.c, migration.c, swap.c.
They could just check only page flag so they could use "false" with pte_check.
< snip >
> > > +ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff,
> > > + size_t count, loff_t *pos,
> > > + struct task_struct *tsk, int write)
> > > +{
> > > + int ret;
> > > + char *buffer;
> > > + u64 *out;
> > > + unsigned long start_addr, end_addr, start_frame, end_frame;
> > > + struct mm_struct *mm = file->private_data;
> > > + struct mm_walk walk = { .pmd_entry = pte_page_idle_proc_range, };
> > > + struct page_node *cur;
> > > + struct page_idle_proc_priv priv;
> > > + bool walk_error = false;
> > > + LIST_HEAD(idle_page_list);
> > > +
> > > + if (!mm || !mmget_not_zero(mm))
> > > + return -EINVAL;
> > > +
> > > + if (count > PAGE_SIZE)
> > > + count = PAGE_SIZE;
> > > +
> > > + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> > > + if (!buffer) {
> > > + ret = -ENOMEM;
> > > + goto out_mmput;
> > > + }
> > > + out = (u64 *)buffer;
> > > +
> > > + if (write && copy_from_user(buffer, ubuff, count)) {
> > > + ret = -EFAULT;
> > > + goto out;
> > > + }
> > > +
> > > + ret = page_idle_get_frames(*pos, count, mm, &start_frame, &end_frame);
> > > + if (ret)
> > > + goto out;
> > > +
> > > + start_addr = (start_frame << PAGE_SHIFT);
> > > + end_addr = (end_frame << PAGE_SHIFT);
> > > + priv.buffer = buffer;
> > > + priv.start_addr = start_addr;
> > > + priv.write = write;
> > > +
> > > + priv.idle_page_list = &idle_page_list;
> > > + priv.cur_page_node = 0;
> > > + priv.page_nodes = kzalloc(sizeof(struct page_node) *
> > > + (end_frame - start_frame), GFP_KERNEL);
> > > + if (!priv.page_nodes) {
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > +
> > > + walk.private = &priv;
> > > + walk.mm = mm;
> > > +
> > > + down_read(&mm->mmap_sem);
> > > +
> > > + /*
> > > + * idle_page_list is needed because walk_page_vma() holds ptlock which
> > > + * deadlocks with page_idle_clear_pte_refs(). So we have to collect all
> > > + * pages first, and then call page_idle_clear_pte_refs().
> > > + */
> >
> > Thanks for the comment, I was curious why you want to have
> > idle_page_list and the reason is here.
> >
> > How about making this /proc/<pid>/page_idle per-process granuariy,
> > unlike system level /sys/xxx/page_idle? What I meant is not to check
> > rmap to see any reference from random process but just check only
> > access from the target process. It would be more proper as /proc/
> > <pid>/ interface and good for per-process tracking as well as
> > fast.
>
>
> I prefer not to do this for the following reasons:
> (1) It makes a feature lost, now accesses to shared pages will not be
> accounted properly.
Do you really want to check global attribute by per-process interface?
That would be doable with existing idle page tracking feature and that's
the one of reasons page idle tracking was born(e.g. even, page cache
for non-mapped) unlike clear_refs.
Once we create a new interface by per-process, just checking the process
-granuariy access check sounds more reasonable to me.
With that, we could catch only idle pages of the target process even though
the page was touched by several other processes.
If the user want to know global level access point, they could use
exisint interface(If there is a concern(e.g., security) to use existing
idle page tracking, let's discuss it as other topic how we could make
existing feature more useful).
IOW, my point is that we already have global access check(1. from ptes
among several processes, 2. from page flag for non-mapped pages) feature
from from existing idle page tracking interface and now we are about to create
new interface for per-process wise so I wanted to create a particular
feature which cannot be covered by existing iterface.
>
> (2) It makes it inconsistent with other idle page tracking mechanism. I
That's the my comment to create different idle page tracking we couldn't
do with existing interface.
> prefer if post per-process. At the heart of it, the tracking is always at the
What does it mean "post per-process"?
> physical page level -- I feel that is how it should be. Other drawback, is
> also we have to document this subtlety.
Sorry, Could you elaborate it a bit?