Re: locking/atomic: Introduce atomic_try_cmpxchg()

From: Linus Torvalds
Date: Fri Mar 24 2017 - 15:17:50 EST

Next message: Mark Brown: "Applied "spi: spi-ti-qspi: Use dma_engine wrapper for dma memcpy call" to the spi tree"
Previous message: Mark Brown: "Applied "ASoC: blackfin: constify snd_soc_ops structures" to the asoc tree"
In reply to: Andy Lutomirski: "Re: locking/atomic: Introduce atomic_try_cmpxchg()"
Next in thread: Peter Zijlstra: "Re: locking/atomic: Introduce atomic_try_cmpxchg()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Mar 24, 2017 at 11:45 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> Is there some hack like if __builtin_is_unescaped(*val) *val = old;
> that would work?

See my recent email suggesting a completely different interface, which
avoids this problem.

My interface generates:

0000000000000000 <T_refcount_inc>:
0: 8b 07 mov (%rdi),%eax
2: 83 f8 ff cmp $0xffffffff,%eax
5: 74 12 je 19 <T_refcount_inc+0x19>
7: 85 c0 test %eax,%eax
9: 74 0a je 15 <T_refcount_inc+0x15>
b: 8d 50 01 lea 0x1(%rax),%edx
e: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
12: 75 ee jne 2 <T_refcount_inc+0x2>
14: c3 retq
15: 31 c0 xor %eax,%eax
17: 0f 0b ud2
19: c3 retq

for PeterZ's test-case, which seems optimal.

Of course, PeterZ used -Os, which isn't actually very natural for the
kernel. Using -O2 I get something else. It turns out that my macro
should use

if (likely(__txchg_success)) goto success_label;

(that "likely()" is criticial) to make gcc not try to optimize for the
looping case.

So with that "likely()" fixed, with -O2 I get:

0000000000000000 <T_refcount_inc>:
0: 8b 07 mov (%rdi),%eax
2: 83 f8 ff cmp $0xffffffff,%eax
5: 74 0d je 14 <T_refcount_inc+0x14>
7: 85 c0 test %eax,%eax
9: 74 12 je 1d <T_refcount_inc+0x1d>
b: 8d 50 01 lea 0x1(%rax),%edx
e: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
12: 75 02 jne 16 <T_refcount_inc+0x16>
14: f3 c3 repz retq
16: 83 f8 ff cmp $0xffffffff,%eax
19: 75 ec jne 7 <T_refcount_inc+0x7>
1b: f3 c3 repz retq
1d: 31 c0 xor %eax,%eax
1f: 0f 0b ud2
21: c3 retq

which again looks pretty optimal (it did indeed actually generate
bigger but potentially higher-performance code by making the good case
be a fallthrough, and the unlikely case be a _forward_ jump that will
be predicted not-taken in the absense of other rpediction information.

(Of course, this also depends on the exact behavior that PeterZ's code
had, namely an exception for use-after-free, but a silent saturation)

Linus

Next message: Mark Brown: "Applied "spi: spi-ti-qspi: Use dma_engine wrapper for dma memcpy call" to the spi tree"
Previous message: Mark Brown: "Applied "ASoC: blackfin: constify snd_soc_ops structures" to the asoc tree"
In reply to: Andy Lutomirski: "Re: locking/atomic: Introduce atomic_try_cmpxchg()"
Next in thread: Peter Zijlstra: "Re: locking/atomic: Introduce atomic_try_cmpxchg()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]