Re: [PATCH 0/7] preempt_count rework -v2

From: Peter Zijlstra
Date: Thu Sep 12 2013 - 07:52:24 EST


On Wed, Sep 11, 2013 at 07:43:42PM -0700, Linus Torvalds wrote:
> On Wed, Sep 11, 2013 at 7:20 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > I split the thing up into two macros GEN_UNARY_RMWcc and
> > GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> > code overall.
>
> Yes, that looks like the right abstraction level. Powerful without
> being complicated.

> That said, the very same memory clobber may be what makes this whole
> approach a loss, if it causes gcc to do tons of reloads or other
> random things.

Random things below, not sure why

> For other cases? Who knows.. A lot of the "change and test atomically"
> things have the same containment need, so it might not be a problem.

Right, all atomic ops already contain a memory clobber to go along with
their memory barrier semantics.

So I did a defconfig build without the patch and with the patch and got
a significant size increase:

text data bss filename
11173443 1423736 1183744 defconfig-build/vmlinux.before
11174516 1423736 1183744 defconfig-build/vmlinux.after

I then undid all the bitop conversions that added the extra memory
clobber and got:

11174204 1423736 1183744 defconfig-build/vmlinux.after

Which is still a significant increase, so I had a look at what GCC
generates, for mm/rmap.c, which uses quite a few atomic_*_and_test(),
functions I got:

text data bss dec hex filename
8675 1 16 8692 21f4 defconfig-build/mm/rmap.o
8739 1 16 8756 2234 defconfig-build/mm/rmap.o

So the increase is there too, doing a objdump -D on them the first
difference is:

0000000000000660 <do_page_add_anon_rmap>:
660: 55 push %rbp
661: 48 89 e5 mov %rsp,%rbp
664: 48 83 ec 20 sub $0x20,%rsp
668: 48 89 5d f0 mov %rbx,-0x10(%rbp)
66c: 4c 89 65 f8 mov %r12,-0x8(%rbp)
670: 48 89 fb mov %rdi,%rbx
673: f0 ff 47 18 lock incl 0x18(%rdi)
677: 0f 94 c0 sete %al
67a: 84 c0 test %al,%al
67c: 75 12 jne 690 <do_page_add_anon_rmap+0x30>
67e: 48 8b 5d f0 mov -0x10(%rbp),%rbx
682: 4c 8b 65 f8 mov -0x8(%rbp),%r12
686: c9 leaveq

vs.:

0000000000000660 <do_page_add_anon_rmap>:
660: 55 push %rbp
661: 48 89 e5 mov %rsp,%rbp
664: 48 83 ec 20 sub $0x20,%rsp
668: 48 89 5d e0 mov %rbx,-0x20(%rbp)
66c: 4c 89 65 e8 mov %r12,-0x18(%rbp)
670: 48 89 fb mov %rdi,%rbx
673: 4c 89 6d f0 mov %r13,-0x10(%rbp)
677: 4c 89 75 f8 mov %r14,-0x8(%rbp)
67b: f0 ff 47 18 lock incl 0x18(%rdi)
67f: 74 17 je 698 <do_page_add_anon_rmap+0x38>
681: 48 8b 5d e0 mov -0x20(%rbp),%rbx
685: 4c 8b 65 e8 mov -0x18(%rbp),%r12
689: 4c 8b 6d f0 mov -0x10(%rbp),%r13
68d: 4c 8b 75 f8 mov -0x8(%rbp),%r14
691: c9 leaveq

For some obscure (to me) reason the new fangled asm goto construct
generates a bunch of extra MOVs.

OTOH the good news is that a kernel with that patch applied does indeed
boot properly on real hardware.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/