Re: [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible
From: Matthew Wilcox
Date: Tue Jun 27 2017 - 03:03:10 EST
On Wed, Jun 07, 2017 at 11:16:37PM +0200, Rasmus Villemoes wrote:
> On Wed, Jun 07 2017, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > @@ -319,6 +319,9 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
> > {
> > if (__builtin_constant_p(nbits) && nbits == 1)
> > __set_bit(start, map);
> > + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) &&
> > + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8))
> > + memset(map + start / 8, 0xff, nbits / 8);
> > else
>
> Isn't the pointer arithmetic wrong here? I think you need to cast map to
> (char*).
Good catch. I've added a test case to test_bitmap.c and it catches
the problem.
> Do you have an example of how the generated code changes, both in the
> case of actual constants and a case where gcc can see that start and
> nbits are byte-aligned without knowing their actual values?
Here's an example of start == 0 and nbits being clearly a multiple of 8,
courtesy of btrfs's extent-io-tests.c.
Before:
f1: 48 8b 5c 24 10 mov 0x10(%rsp),%rbx
f6: 4c 8b 3c 24 mov (%rsp),%r15
fa: 31 f6 xor %esi,%esi
fc: 44 8d 34 dd 00 00 00 lea 0x0(,%rbx,8),%r14d
103: 00
104: 4c 8d 2c dd 00 00 00 lea 0x0(,%rbx,8),%r13
10b: 00
10c: 4c 89 ff mov %r15,%rdi
10f: 44 89 f2 mov %r14d,%edx
112: e8 00 00 00 00 callq 117 <__test_eb_bitmaps+0x77>
113: R_X86_64_PC32 bitmap_set-0x4
117: 31 d2 xor %edx,%edx
119: 31 f6 xor %esi,%esi
11b: 4c 89 e9 mov %r13,%rcx
11e: 4c 89 e7 mov %r12,%rdi
121: e8 00 00 00 00 callq 126 <__test_eb_bitmaps+0x86>
122: R_X86_64_PC32 extent_buffer_bitmap_set-0x4
After:
f1: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15
f6: be ff 00 00 00 mov $0xff,%esi
fb: 48 89 df mov %rbx,%rdi
fe: 4d 89 fe mov %r15,%r14
101: 4e 8d 2c fd 00 00 00 lea 0x0(,%r15,8),%r13
108: 00
109: 41 81 e6 ff ff ff 1f and $0x1fffffff,%r14d
110: 4c 89 f2 mov %r14,%rdx
113: e8 00 00 00 00 callq 118 <__test_eb_bitmaps+0x78>
114: R_X86_64_PC32 memset-0x4
118: 48 8b 2c 24 mov (%rsp),%rbp
11c: 31 d2 xor %edx,%edx
11e: 31 f6 xor %esi,%esi
120: 4c 89 e9 mov %r13,%rcx
123: 48 89 ef mov %rbp,%rdi
126: e8 00 00 00 00 callq 12b <__test_eb_bitmaps+0x8b>
127: R_X86_64_PC32 extent_buffer_bitmap_set-0x4
It's actually slightly less efficient in the caller (although obviously
memset() is going to execute faster than bitmap_set()). Partly because
x86 made some odd choices about the behaviour of an 8-bit move instruction
(it leaves the upper 24 bits intact, rather than zeroing them, so gcc
has to use a 32-bit move instruction to put 0xff into the second argument
to memset()), and partly because gcc is seeing us do:
unsigned long len;
len = len * 8 / 8;
so it's (very reasonably) transforming that into len &= 0x1fff'ffff'ffff'ffff;
That's partly a quirk of how this particular function is written.