Re: per-process shared information
From: Hugh Dickins
Date: Fri Oct 15 2004 - 07:00:23 EST
On Fri, 15 Oct 2004, Andrea Arcangeli wrote:
> On Thu, Oct 14, 2004 at 10:49:28PM +0100, Hugh Dickins wrote:
> > Is "shared" generally expected to pair with rss? Would it make
>
> shared is expected to work like in linux 2.4 (and apparently solaris),
> which means _physical_ pages mapped in more than one task.
Well, it didn't quite mean that in 2.4: since any pagecache (including
swapcache) page mapped into a single task would have page_count 2 and
so be counted as shared.
I think 2.4 was already trying to come up with a plausible simulacrum
of numbers that made sense to gather in 2.2, but the numbers had lost
their point, and it only had page_count to play with. Or maybe 2.2
was already trying to make up numbers to fit with 2.0...
> > Sounds horrid to me! I'm not inclined to volunteer for that: plus this
>
> what's horrid? would you add a O(log(N)) slowdown in the fast paths to
> provide the stat in O(1)? I much prefer an O(N) loop in the stats as far
> as it catches signals and reschedules as soon as need_resched is set.
Of course I prefer to keep significant slowdown out of the fast paths
("significant" inserted there because I don't mind the fastish path
incrementing just one count in an already dirty cacheline).
But I don't want to give myself unnecessary work, and I don't want to
give the cpu unnecessary work, particularly if the stats gathering is
in danger of dominating some profiled load. Bill had good reason to
remove even the vma walk; but I accept you're being careful to propose
that we keep the overhead out of existing /proc/pid uses - right if
we have to go that way, but I just prefer to avoid the work myself.
> if you can suggest a not-horrid approach to avoid breaking binary
> compatibility to 2.4 you're welcome ;)
I hope that's what my patch would be sufficient to achieve.
It would be unfair to say 2.4's numbers were actually a bug, but
certainly peculiar: I'm about as interested in exactly reproducing
their oddities as in building a replica of some antique furniture.
> > One, support anon_rss as a subset of rss, "shared" being (rss - anon_rss).
> > Yes, that's a slight change in meaning of "shared" from in 2.4, but easy
> > to support and I think very reasonable. On the one hand, yes, of course
>
> that's certainly much better than what we have right now, it's much
> closer to the old semantics, but I'm not sure if it's enough to be
> compliant with the other OS (including 2.4). I will ask.
Thanks, please do.
> I also guess the app will stop breaking since rss - shared will not wrap
> anymore.
Oh, if that's all we need, I can do a simpler patch ;)
> > we know an anon page may actually be shared between several mms of the
> > fork group, whereas it won't be counted in "shared" with this patch. But
> > the old definition of "shared" was considerably more stupid, wasn't it?
> > for example, a private page in pte and swap cache got counted as shared.
>
> just checking mapcount > 1 would do it right in 2.6.
Interesting idea, and now (well, 2.6.9-mm heading to 2.6.10) we have
atomic_inc_return and atomic_dec_return supported on all architectures,
it should be possible to adjust an mm->shared_rss each time mapcount
goes up from 1 or down to 1, as well as adjusting nr_mapped count
as we do when it goes up from 0 or down to 0.
Though I think I prefer the anon_rss count in yesterday's patch,
which is at least well-defined. And will usually give you numbers
much closer to 2.4's than shared_rss (since, as noted above, 2.4
counted a page shared between pagetable and pagecache as shared,
which mapcount 1 would not).
> > Would this new meaning of "shared" suit your purposes well enough?
>
> It'd be fine for me, but I'm no the one how's having troubles.
Let's wait and see how (rss - anon_rss) works out for your customer.
> > shouldn't change that now, but add your statm_phys_shared; whatever,
>
> the only reason to add statm_phys_shared was to keep ps xav fast, if you
> don't slowdown pa xav you can add another field at the end of statm.
We should ask Albert which he prefers: /proc/pid/statm "shared" field
revert to an rss-like count as in 2.4, subset of "resident", while size,
text and data fields remain extents; or leave that third field as in
earlier 2.6 and add a shared-rss field on the end?
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/