Re: [PATCH] percpu_counter : add percpu_counter_add_fast()
From: Nick Piggin
Date: Thu Oct 21 2010 - 21:59:01 EST
On Thu, Oct 21, 2010 at 06:55:16PM -0700, Andrew Morton wrote:
> On Thu, 21 Oct 2010 17:45:36 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > this_cpu_add_return() isn't really needed in this application.
> >
> > {
> > this_cpu_add(*fbc->counters, amount);
> > if (unlikely(abs(this_cpu_read(*fbc->counters)) > fbc->batch))
> > out_of_line_stuff();
> > }
> >
> > will work just fine.
>
> Did that. Got alarmed at a few things.
>
> The compiler cannot CSE the above code - it has to reload the percpu
> base each time. Doing it by hand:
>
>
> {
> long *p;
>
> p = this_cpu_ptr(fbc->counters);
> *p += amount;
> if (unlikely(abs(*p) > fbc->batch))
> out_of_line_stuff();
> }
>
> generates better code.
>
> So this:
>
> static __always_inline void percpu_counter_add_batch(struct percpu_counter *fbc,
> s64 amount, long batch)
> {
> long *pcounter;
>
> preempt_disable();
> pcounter = this_cpu_ptr(fbc->counters);
> *pcounter += amount;
> if (unlikely(abs(*pcounter) >= batch))
> percpu_counter_handle_overflow(fbc);
> preempt_enable();
> }
>
> when compiling this:
>
> --- a/lib/proportions.c~b
> +++ a/lib/proportions.c
> @@ -263,6 +263,11 @@ void __prop_inc_percpu(struct prop_descr
> prop_put_global(pd, pg);
> }
>
> +void foo(struct prop_local_percpu *pl)
> +{
> + percpu_counter_add(&pl->events, 1);
> +}
> +
> /*
> * identical to __prop_inc_percpu, except that it limits this pl's fraction to
> * @frac/PROP_FRAC_BASE by ignoring events when this limit has been exceeded.
>
> comes down to
>
> .globl foo
> .type foo, @function
> foo:
> pushq %rbp #
> movslq percpu_counter_batch(%rip),%rcx # percpu_counter_batch, batch
> movq 96(%rdi), %rdx # <variable>.counters, tcp_ptr__
> movq %rsp, %rbp #,
> #APP
> add %gs:this_cpu_off, %rdx # this_cpu_off, tcp_ptr__
> #NO_APP
> movq (%rdx), %rax #* tcp_ptr__, D.11817
> incq %rax # D.11817
> movq %rax, (%rdx) # D.11817,* tcp_ptr__
> cqto
> xorq %rdx, %rax # tmp67, D.11817
> subq %rdx, %rax # tmp67, D.11817
> cmpq %rcx, %rax # batch, D.11817
> jl .L33 #,
> call percpu_counter_handle_overflow #
> .L33:
> leave
> ret
>
>
> But what's really alarming is that the compiler (4.0.2) is cheerily
> ignoring the inline directives and was generating out-of-line versions
> of most of the percpu_counter.h functions into lib/proportions.s.
> That's rather a worry.
You you have the "ignore inlining" and "compile for size" turned
on? They often suck.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/