Re: [RFC] Optimising IS_ERR_OR_NULL

From: Alexander Holler
Date: Sat Apr 05 2014 - 11:49:27 EST


Am 05.04.2014 16:43, schrieb Matthew Wilcox:

(4 days too late for April Fools ... oh well :-)

I don't like the look of IS_ERR_OR_NULL. It does two tests when (due to
the bit patterns used to represent errors and NULL pointers) it could
use just one:

#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)

static inline long __must_check IS_ERR_OR_NULL(__force const void *ptr)
{
return !ptr || IS_ERR_VALUE((unsigned long)ptr);
}

It needs some performance testing to be sure that it's a win, but I'm
essentially suggesting this:

+++ b/include/linux/err.h
@@ -34,10 +34,8 @@ static inline long __must_check IS_ERR(__force const void *pt
return IS_ERR_VALUE((unsigned long)ptr);
}

-static inline long __must_check IS_ERR_OR_NULL(__force const void *ptr)
-{
- return !ptr || IS_ERR_VALUE((unsigned long)ptr);
-}
+#define IS_ERR_OR_NULL(ptr) \
+ unlikely(((unsigned long)PTR_ERR(ptr) + MAX_ERRNO) <= MAX_ERRNO)

/**
* ERR_CAST - Explicitly cast an error-valued pointer to another pointer type

(deliberately whitespace-damaged to ensure it doesn't get applied
without thought).

Comments, before I go off and run some perf tests?


(... x86 asm)

Now that's a genuine improvement; we saved one instruction (lea, cmp,
jbe vs test, je, cmp, ja), and due to various alignment / padding / etc.
we ended up saving 4 bytes on the length of the function which turns
into 16 bytes due to function alignment.

A first comment, if that really will be changed, please leave to old function as comment for the lazy reader. The new one is pretty ugly and needs a turned on brain to understand (besides that the new one discards the __must_check, but I would assume that no one uses IS_ERR_OR_NULL() without checking the return value).

As I just was a bit bored, here's what happens on ARM so that others don't have to compile it on ARM:

armv5 old:

000011e0 <kern_unmount>:
11e0: e92d4010 push {r4, lr}
11e4: e2504000 subs r4, r0, #0
11e8: 0a000007 beq 120c <kern_unmount+0x2c>
11ec: e3740a01 cmn r4, #4096 ; 0x1000
11f0: 8a000005 bhi 120c <kern_unmount+0x2c>
11f4: e3a03000 mov r3, #0
11f8: e5843064 str r3, [r4, #100] ; 0x64
11fc: ebfffffe bl 0 <synchronize_rcu>
1200: e1a00004 mov r0, r4
1204: e8bd4010 pop {r4, lr}
1208: eafffffe b c78 <mntput>
120c: e8bd8010 pop {r4, pc}

armv5 new:

00000c98 <kern_unmount>:
c98: e2803eff add r3, r0, #4080 ; 0xff0
c9c: e283300f add r3, r3, #15
ca0: e3530a01 cmp r3, #4096 ; 0x1000
ca4: e92d4010 push {r4, lr}
ca8: e1a04000 mov r4, r0
cac: 3a000005 bcc cc8 <kern_unmount+0x30>
cb0: e3a03000 mov r3, #0
cb4: e5803064 str r3, [r0, #100] ; 0x64
cb8: ebfffffe bl 0 <synchronize_rcu>
cbc: e1a00004 mov r0, r4
cc0: e8bd4010 pop {r4, lr}
cc4: eafffffe b c78 <mntput>
cc8: e8bd8010 pop {r4, pc}

And armv7 old:

00000e60 <kern_unmount>:
e60: e92d4010 push {r4, lr}
e64: e2504000 subs r4, r0, #0
e68: 08bd8010 popeq {r4, pc}
e6c: e3740a01 cmn r4, #4096 ; 0x1000
e70: 88bd8010 pophi {r4, pc}
e74: e3a03000 mov r3, #0
e78: e5843064 str r3, [r4, #100] ; 0x64
e7c: ebfffffe bl 0 <synchronize_sched>
e80: e1a00004 mov r0, r4
e84: e8bd4010 pop {r4, lr}
e88: eafffffe b a3c <mntput>

armv7 new:

00000a5c <kern_unmount>:
a5c: e2803eff add r3, r0, #4080 ; 0xff0
a60: e283300f add r3, r3, #15
a64: e3530a01 cmp r3, #4096 ; 0x1000
a68: e92d4010 push {r4, lr}
a6c: e1a04000 mov r4, r0
a70: 38bd8010 popcc {r4, pc}
a74: e3a03000 mov r3, #0
a78: e5803064 str r3, [r0, #100] ; 0x64
a7c: ebfffffe bl 0 <synchronize_sched>
a80: e1a00004 mov r0, r4
a84: e8bd4010 pop {r4, lr}
a88: eafffffe b a3c <mntput>

Unfortunately I'm bad in interpreting assembler (I prefer higher languages like C++), therefor I don't even try it and leave further comments on the ARM assembler to other people. ;)

Regards,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/