Re: [RFC mm][PATCH 2/5] percpu cached mm counter

From: KAMEZAWA Hiroyuki
Date: Thu Dec 10 2009 - 03:23:48 EST


On Thu, 10 Dec 2009 08:54:54 +0100
Ingo Molnar <mingo@xxxxxxx> wrote:

>
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> >
> > Now, mm's counter information is updated by atomic_long_xxx()
> > functions if USE_SPLIT_PTLOCKS is defined. This causes cache-miss when
> > page faults happens simultaneously in prural cpus. (Almost all
> > process-shared objects is...)
> >
> > Considering accounting per-mm page usage more, one of problems is cost
> > of this counter.
>
> I'd really like these kinds of stats available via the tool you used to
> develop this patchset:
>
> > After:
> > Performance counter stats for './multi-fault 2' (5 runs):
> >
> > 46997471 page-faults ( +- 0.720% )
> > 1004100076 cache-references ( +- 0.734% )
> > 180959964 cache-misses ( +- 0.374% )
> > 29263437363580464 bus-cycles ( +- 0.002% )
> >
> > 60.003315683 seconds time elapsed ( +- 0.004% )
> >
> > cachemiss/page faults is reduced from 4.55 miss/faults to be 3.85miss/faults
>
> I.e. why not expose these stats via perf events and counts as well,
> beyond the current (rather minimal) set of MM stats perf supports
> currently?
>
> That way we'd get a _lot_ of interesting per task mm stats available via
> perf stat (and maybe they can be profiled as well via perf record), and
> we could perhaps avoid uglies like having to hack hooks into sched.c:
>

As I wrote in 0/5, this is finally for oom-killer, for "kernel internal use".


Not for user's perf evetns.

- http://marc.info/?l=linux-mm&m=125714672531121&w=2

And Christoph has concerns on cache-miss on this counter.

- http://archives.free.net.ph/message/20091104.191441.1098b93c.ja.html

This patch is for replcacing atomic_long_add() with percpu counter.


> > + /*
> > + * sync/invaldidate per-cpu cached mm related information
> > + * before taling rq->lock. (see include/linux/mm.h)
>
> (minor typo: s/taling/taking )
>
Oh, thanks.

> > + */
> > + sync_mm_counters_atomic();
> >
> > spin_lock_irq(&rq->lock);
> > update_rq_clock(rq);
>
> It's not a simple task i guess since this per mm counting business has
> grown its own variant which takes time to rearchitect, plus i'm sure
> there's performance issues to solve if such a model is exposed via perf,
> but users and developers would be _very_ well served by such
> capabilities:
>
> - clean, syscall based API available to monitor tasks, workloads and
> CPUs. (or the whole system)
>
> - sampling (profiling)
>
> - tracing, post-process scripting via Perl plugins
>

I'm sorry If I miss your point...are you saying remove all mm_counter completely
and remake them under perf ? If so, some proc file (/proc/<pid>/statm etc)
will be corrupted ?

Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/