Re: [PATCH bugfix] proc/pagemap: correctly report non-present ptesand holes between vmas

From: Konstantin Khlebnikov
Date: Mon Apr 30 2012 - 15:19:42 EST


Naoya Horiguchi wrote:
Hi,

On Sat, Apr 28, 2012 at 08:22:30PM +0400, Konstantin Khlebnikov wrote:
This patch resets current pagemap-entry if current pte isn't present,
or if current vma is over. Otherwise pagemap reports last entry again and again.

non-present pte reporting was broken in commit v3.3-3738-g092b50b
("pagemap: introduce data structure for pagemap entry")

reporting for holes was broken in commit v3.3-3734-g5aaabe8
("pagemap: avoid splitting thp when reading /proc/pid/pagemap")

Signed-off-by: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>
Reported-by: Pavel Emelyanov<xemul@xxxxxxxxxxxxx>
Cc: Naoya Horiguchi<n-horiguchi@xxxxxxxxxxxxx>
Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@xxxxxxxxxxxxxx>
Cc: Andi Kleen<ak@xxxxxxxxxxxxxxx>

Thanks for your efforts.
I confirmed that this patch fixes the problem on v3.4-rc4.
But originally (before the commits you pointed to above) initializing
pagemap entries (originally labelled with confusing 'pfn') were done
in for-loop in pagemap_pte_range(), so I think it's better to get it
back to things like that.

How about the following?

I don't like this. Functions which returns void should always initialize its "output"
argument, it much more clear than relying on preinitialized value.

---
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2b9a760..538f8d8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -779,13 +779,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
struct pagemapread *pm = walk->private;
pte_t *pte;
int err = 0;
- pagemap_entry_t pme = make_pme(PM_NOT_PRESENT);
+ pagemap_entry_t pme;

/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
if (pmd_trans_huge_lock(pmd, vma) == 1) {
for (; addr != end; addr += PAGE_SIZE) {
unsigned long offset;
+ pme = make_pme(PM_NOT_PRESENT);

offset = (addr& ~PAGEMAP_WALK_MASK)>>
PAGE_SHIFT;
@@ -801,6 +802,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
if (pmd_trans_unstable(pmd))
return 0;
for (; addr != end; addr += PAGE_SIZE) {
+ pme = make_pme(PM_NOT_PRESENT);

/* check to see if we've left 'vma' behind
* and need a new, higher one */
@@ -842,10 +844,10 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
{
struct pagemapread *pm = walk->private;
int err = 0;
- pagemap_entry_t pme = make_pme(PM_NOT_PRESENT);

for (; addr != end; addr += PAGE_SIZE) {
int offset = (addr& ~hmask)>> PAGE_SHIFT;
+ pagemap_entry_t pme = make_pme(PM_NOT_PRESENT);
huge_pte_to_pagemap_entry(&pme, *pte, offset);
err = add_to_pagemap(addr,&pme, pm);
if (err)

---
Thanks,
Naoya

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/