Re: [PATCH v2 2/2] proc: Add getting pages info of ZONE_DEVICE support

From: Xiongwei Song
Date: Tue Jan 11 2022 - 02:52:00 EST


Hi David,

On Mon, Jan 10, 2022 at 10:34 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 10.01.22 15:19, sxwjean@xxxxxx wrote:
> > From: Xiongwei Song <sxwjean@xxxxxxxxx>
> >
> > When requesting pages info by /proc/kpage*, the pages in ZONE_DEVICE were
> > missed.
> >
>
> The "missed" part makes it sound like this was done by accident. On the
> contrary, for now we decided to not expose these pages that way, for
> example, because determining if the memmap was already properly
> initialized isn't quite easy.
>

Understood. Thank you for the explanation.

>
> > The pfn_to_devmap_page() function can help to get page that belongs to
> > ZONE_DEVICE.
>
> What's the main motivation for this?

There is no special case. My customer wanted to check page flags in system wide.
I tried to find the way and found there is no capability for pages of
ZONE_DEVICE,
so tried to make the patch and see if upstream needs it.

>
> >
> > Signed-off-by: Xiongwei Song <sxwjean@xxxxxxxxx>
> > ---
> > fs/proc/page.c | 35 ++++++++++++++++++++++-------------
> > 1 file changed, 22 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > index 9f1077d94cde..2cdc2b315ff8 100644
> > --- a/fs/proc/page.c
> > +++ b/fs/proc/page.c
> > @@ -15,6 +15,7 @@
> > #include <linux/page_idle.h>
> > #include <linux/kernel-page-flags.h>
> > #include <linux/uaccess.h>
> > +#include <linux/memremap.h>
> > #include "internal.h"
> >
> > #define KPMSIZE sizeof(u64)
> > @@ -46,6 +47,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
> > {
> > const unsigned long max_dump_pfn = get_max_dump_pfn();
> > u64 __user *out = (u64 __user *)buf;
> > + struct dev_pagemap *pgmap = NULL;
> > struct page *ppage;
> > unsigned long src = *ppos;
> > unsigned long pfn;
> > @@ -60,17 +62,18 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
> > count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);
> >
> > while (count > 0) {
> > - /*
> > - * TODO: ZONE_DEVICE support requires to identify
> > - * memmaps that were actually initialized.
> > - */
> > ppage = pfn_to_online_page(pfn);
> > + if (!ppage)
> > + ppage = pfn_to_devmap_page(pfn, &pgmap);
> >
> > if (!ppage || PageSlab(ppage) || page_has_type(ppage))
> > pcount = 0;
> > else
> > pcount = page_mapcount(ppage);
> >
> > + if (pgmap)
> > + put_dev_pagemap(pgmap);
>
> Ehm, don't you have to reset pgmap back to NULL? Otherwise during the
> next iteration, you'll see pgmap != NULL again.

Oops. I totally agree. Will do this in the next version.

>
> > +
> > if (put_user(pcount, out)) {
> > ret = -EFAULT;
> > break;
> > @@ -229,10 +232,12 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
> > {
> > const unsigned long max_dump_pfn = get_max_dump_pfn();
> > u64 __user *out = (u64 __user *)buf;
> > + struct dev_pagemap *pgmap = NULL;
> > struct page *ppage;
> > unsigned long src = *ppos;
> > unsigned long pfn;
> > ssize_t ret = 0;
> > + u64 flags;
> >
> > pfn = src / KPMSIZE;
> > if (src & KPMMASK || count & KPMMASK)
> > @@ -242,13 +247,15 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
> > count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);
> >
> > while (count > 0) {
> > - /*
> > - * TODO: ZONE_DEVICE support requires to identify
> > - * memmaps that were actually initialized.
> > - */
> > ppage = pfn_to_online_page(pfn);
> > + if (!ppage)
> > + ppage = pfn_to_devmap_page(pfn, &pgmap);
> > +
> > + flags = stable_page_flags(ppage);
> > + if (pgmap)
> > + put_dev_pagemap(pgmap);
>
> Similar comment.

Okay.

>
> >
> > - if (put_user(stable_page_flags(ppage), out)) {
> > + if (put_user(flags, out)) {
> > ret = -EFAULT;
> > break;
> > }
> > @@ -277,6 +284,7 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
> > {
> > const unsigned long max_dump_pfn = get_max_dump_pfn();
> > u64 __user *out = (u64 __user *)buf;
> > + struct dev_pagemap *pgmap = NULL;
> > struct page *ppage;
> > unsigned long src = *ppos;
> > unsigned long pfn;
> > @@ -291,17 +299,18 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
> > count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);
> >
> > while (count > 0) {
> > - /*
> > - * TODO: ZONE_DEVICE support requires to identify
> > - * memmaps that were actually initialized.
> > - */
> > ppage = pfn_to_online_page(pfn);
> > + if (!ppage)
> > + ppage = pfn_to_devmap_page(pfn, &pgmap);
> >
> > if (ppage)
> > ino = page_cgroup_ino(ppage);
> > else
> > ino = 0;
> >
> > + if (pgmap)
> > + put_dev_pagemap(pgmap);
>
> Similar comment.

Okay.

>
>
> IIRC, we might still stumble over uninitialized devmap memmaps that
> essentially contain garbage -- I recall it might be the device metadata.
> I wonder if we at least have to check pgmap_pfn_valid().

Oh, ok. But how about putting pgmap_pfn_valid into pfn_to_devmap_page()?

Appreciated your review.

Regards,
Xiongwei