Re: [bug, 4.8] /proc/meminfo: counter values are very wrong

From: Dave Chinner
Date: Fri Aug 05 2016 - 08:07:44 EST


On Fri, Aug 05, 2016 at 09:59:35PM +1000, Dave Chinner wrote:
> On Fri, Aug 05, 2016 at 11:54:17AM +0100, Mel Gorman wrote:
> > On Fri, Aug 05, 2016 at 09:11:10AM +1000, Dave Chinner wrote:
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index fb975cec3518..baa97da3687d 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -4064,7 +4064,7 @@ long si_mem_available(void)
> > > > int lru;
> > > >
> > > > for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
> > > > - pages[lru] = global_page_state(NR_LRU_BASE + lru);
> > > > + pages[lru] = global_node_page_state(NR_LRU_BASE + lru);
> > > >
> > > > for_each_zone(zone)
> > > > wmark_low += zone->watermark[WMARK_LOW];
> > >
> > > OK, that makes the /proc accounting match the /sys per-node
> > > accounting, but the output still looks wrong. I remove files with
> > > cached pages from the filesystem (i.e. invalidate and free them),
> > > yet they are apparrently still accounted as being on the
> > > active/inactive LRU.
> > >
> > > Reboot, then run dbench for a minute:
> > >
> > > $ sudo mkfs.xfs -f /dev/pmem1
> > > meta-data=/dev/pmem1 isize=512 agcount=4, agsize=524288 blks
> > > = sectsz=4096 attr=2, projid32bit=1
> > > = crc=1 finobt=1, sparse=0
> > > data = bsize=4096 blocks=2097152, imaxpct=25
> > > = sunit=0 swidth=0 blks
> > > naming =version 2 bsize=4096 ascii-ci=0 ftype=1
> > > log =internal log bsize=4096 blocks=2560, version=2
> > > = sectsz=4096 sunit=1 blks, lazy-count=1
> > > realtime =none extsz=4096 blocks=0, rtextents=0
> > > $ sudo mount /dev/pmem1 /mnt/scratch
> > > $ sudo dbench -t 60 -D /mnt/scratch/ 16
> > > dbench version 4.00 - Copyright Andrew Tridgell 1999-2004
> > >
> >
> > Is there any chance this is related to pmem1?
>
> Nope, just reproduced it on /dev/vdc to confirm.

And same thing on a different machine, using iscsi storage.

Also, I just noticed that unmounting the filesystem doesn't clean it
up - the inactive/active LRU usage is still there.

Ah, maybe this is a result of the all the block layer request
flag changes and that's screwing up the page accounting. I'll get
back to you on that - not going to try to test that now as it's
almost bed time.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx