Re: [PATCH] x86: Implement __WARN using UD0

From: Peter Zijlstra
Date: Thu Feb 23 2017 - 09:58:21 EST


On Thu, Feb 23, 2017 at 06:14:45AM -0800, Arjan van de Ven wrote:
> On 2/23/2017 5:28 AM, Peter Zijlstra wrote:
> >
> >By using "UD0" for WARNs we remove the function call and its possible
> >__FILE__ and __LINE__ immediate arguments from the instruction stream.
> >
> >Total image size will not change much, what we win in the instruction
> >stream we'll loose because of the __bug_table entries. Still, saves on
> >I$ footprint and the total image size does go down a bit.
>
> well I am a little sceptical; WARNs are rare so the code (other than the test)
> should be waaay out of line already (unlikely() and co).

There's only so much you can do in small functions. Sure it tries to
move the crud to the end, but at the end of the day, its still in the
same function.

> And I assume you're not removing the __FILE__ and __LINE__ info, since that info
> is actually high value for us developers... so what are you actually saving?

OK, so going by my own numbers the total image size does not in fact go
down (it did earlier when I initially wrote this patch) :/

I think back when I wrote this I had refcount_t generate inline WARNs
and that generates a huge amount of junk all over the place. This very
much did clear much of that up.

And as said; there's only so much you can do in small functions.

Look at the below for example (frobbed the code to do WARN_ON instead of
WARN), depending on alignment the WARN code will be in the same
cacheline as 'normal' code.


0000000000000016 <refcount_add_not_zero>:
16: 55 push %rbp
17: 8b 16 mov (%rsi),%edx
19: 41 83 c8 ff or $0xffffffff,%r8d
1d: 48 89 e5 mov %rsp,%rbp
20: 85 d2 test %edx,%edx
22: 74 25 je 49 <refcount_add_not_zero+0x33>
24: 83 fa ff cmp $0xffffffff,%edx
27: 74 1c je 45 <refcount_add_not_zero+0x2f>
29: 89 d1 mov %edx,%ecx
2b: 89 d0 mov %edx,%eax
2d: 01 f9 add %edi,%ecx
2f: 41 0f 42 c8 cmovb %r8d,%ecx
33: f0 0f b1 0e lock cmpxchg %ecx,(%rsi)
37: 39 c2 cmp %eax,%edx
39: 74 04 je 3f <refcount_add_not_zero+0x29>
3b: 89 c2 mov %eax,%edx
3d: eb e1 jmp 20 <refcount_add_not_zero+0xa>
3f: ff c1 inc %ecx
41: 75 02 jne 45 <refcount_add_not_zero+0x2f>
43: 0f ff (bad)
45: b0 01 mov $0x1,%al
47: eb 02 jmp 4b <refcount_add_not_zero+0x35>
49: 31 c0 xor %eax,%eax
4b: 5d pop %rbp
4c: c3 retq



0000000000000016 <refcount_add_not_zero>:
16: 8b 16 mov (%rsi),%edx
18: 41 83 c8 ff or $0xffffffff,%r8d
1c: 85 d2 test %edx,%edx
1e: 74 3b je 5b <refcount_add_not_zero+0x45>
20: 83 fa ff cmp $0xffffffff,%edx
23: 75 03 jne 28 <refcount_add_not_zero+0x12>
25: b0 01 mov $0x1,%al
27: c3 retq
28: 89 d1 mov %edx,%ecx
2a: 89 d0 mov %edx,%eax
2c: 01 f9 add %edi,%ecx
2e: 41 0f 42 c8 cmovb %r8d,%ecx
32: f0 0f b1 0e lock cmpxchg %ecx,(%rsi)
36: 39 c2 cmp %eax,%edx
38: 74 04 je 3e <refcount_add_not_zero+0x28>
3a: 89 c2 mov %eax,%edx
3c: eb de jmp 1c <refcount_add_not_zero+0x6>
3e: ff c1 inc %ecx
40: 75 e3 jne 25 <refcount_add_not_zero+0xf>
42: 55 push %rbp
43: be 3f 00 00 00 mov $0x3f,%esi
48: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
4b: R_X86_64_32S .rodata.str1.1
4f: 48 89 e5 mov %rsp,%rbp
52: e8 00 00 00 00 callq 57 <refcount_add_not_zero+0x41>
53: R_X86_64_PC32 warn_slowpath_null-0x4
57: b0 01 mov $0x1,%al
59: 5d pop %rbp
5a: c3 retq
5b: 31 c0 xor %eax,%eax
5d: c3 retq