Re: [PATCH 4/4] maps: /proc/<pid>/pmaps interface - memory maps ingranularity of pages

From: Fengguang Wu
Date: Fri Aug 17 2007 - 22:49:15 EST


Matt,

On Fri, Aug 17, 2007 at 11:58:08AM -0500, Matt Mackall wrote:
> On Fri, Aug 17, 2007 at 02:47:27PM +0800, Fengguang Wu wrote:
> > It's not easy to do direct performance comparisons between pmaps and
> > pagemap/kpagemap. However some close analyzes are still possible :)
> >
> > 1) code size
> > pmaps ~200 LOC
> > pagemap/kpagemap ~300 LOC
> >
> > 2) dataset size
> > take for example my running firefox on Intel Core 2:
> > VSZ 400 MB
> > RSS 64 MB, or 16k pages
> > pmaps 64 KB, wc shows 2k lines, or so much page ranges
> > pagemap 800 KB, could be heavily optimized by returning partial data
>
> I take it you're in 64-bit mode?

Yes. That will be the common case.

> You're right, this data compresses well in many circumstances. I
> suspect it will suffer under memory pressure though. That will
> fragment the ranges in-memory and also fragment the active bits. The
> worst case here is huge, of course, but realistically I'd expect
> something like 2x-4x.

Not likely to degrade even under memory pressure ;)

The compress_ratio = (VSZ:RSS) * (RSS:page_ranges).
- On fresh startup and no memory pressure,
- the VSZ:RSS ratio of ALL processes are 4516796KB:457048KB ~= 10:1.
- the firefox case shows a (RSS:page_ranges) of 16k:2k ~= 8:1.
- On memory pressure,
- as VSZ goes up, RSS will be bounded by physical memory.
So VSZ:RSS ratio actually goes up with memory pressure.
- page range is a good unit of locality. They are more likely to be
reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.

> But there are still the downsides I have mentioned:
>
> - you don't get page frame numbers

True. I guess PFNs are meaningless to a normal user?

> - you can't do random access

Not for now.

It would be trivial to support seek-by-address semantic: the seqfile
operations already iterate by addresses. Only that we cannot do it via
the regular read/pread/seek interfaces. They have different semantic
on fpos. However, tricks like ioctl(begin_addr, end_addr) can be
employed if necessary.

> And how long does it take to pull the data out? My benchmarks show
> greater than 50MB/s (and that's with the version in -mm that's doing
> double buffering), so that 800K would take < .016s.

You are right :)

> > kpagemap 256 KB
> >
> > 3) runtime overheads
> > pmaps 2k lines of string processing(encode/decode)
> > kpagemap 16k seek()/read()s, and context switches (could be
> > optimized somehow by doing a PFN sort first, but
> > that's also non-trivial overheads)
>
> You can do anywhere between 16k small reads or 1 large read. Depends

No way to avoid the seeks if PFNs are discontinuous. Too bad the
memory get fragmented with uptime, at least for the current kernel.

But sure, sequential reads are viable when doing whole system memory
analysis, or for memory hog processes.

> what data you're trying to get. Right now, kpagemap is fast enough
> that I can do realtime displays of the whole of memory in my desktop
> in a GUI written in Python. And Python is fairly horrible for drawing
> bitmaps and such.
>
> http://www.selenic.com/Screenshot-kpagemap.png
>
> > So pmaps seems to be a clear winner :)
>
> Except that it's only providing a subset of the data.

Yes, and it's a nice graph :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/