Re: [profile] amortize atomic hit count increments

From: David S. Miller
Date: Tue Sep 14 2004 - 00:52:57 EST


On Mon, 13 Sep 2004 22:32:18 -0700
William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:

> This was my original approach (modulo eliminating the global buffer
> and the atomic operations), but space concerns stymied it, as the
> profile buffer can be several megabytes large. It would likely perform
> better in general if admissible, for whatever value performance is
> considered to have.
>
> There is also an unusual facet to this; the TLB overhead of a loop like:
> for (i = 0; i < prof_len; ++i) {
> for_each_online_cpu(cpu)
> global_buf[i] += per_cpu(cpu_prof_buffer, cpu)[i];
> }
> is very large and caused "effective nontermination", otherwise known as
> "exhausting the user's patience", on SGI's systems after about half an
> hour. So some TLB overhead amortization is necessary for this to be
> feasible. I suspect iterating over pages of the profile buffer and
> storing intermediate results for a page full of profile buffer hits
> in a buffer page may suffice though I've not tried it.

I bet that, like we found out about page tables on 64-bit, these
profile buffers are sparsely populated with hits. So perhaps a
per-cpu bitmap that indicates regions that might have any hits
at all, allowing large amounts of skipping and thus amortizing the
scan cost.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/