Re: [PATCH] blkdev: use an efficient way to check merge flags

From: Jan Kara
Date: Thu Apr 03 2014 - 08:46:02 EST


On Thu 03-04-14 16:00:44, Zhan Jianyu wrote:
> On Thu, Apr 3, 2014 at 2:13 AM, Jan Kara <jack@xxxxxxx> wrote:
> > OK, but have you checked the generated code is actually any better? This
> > is something I'd expect a compiler might be able to optimize anyway. And the
> > original code looks more readable to me.
>
> Hi, Jan,
>
> I've disassemble the code on my x86_64 box
> (it's inline though, I just look at its call site),
> and found that this patch DOES make it more efficient.
>
> Orig asm snippt with
> patch asm snippt
> ============ ================
>
> mov %edx,%ecx mov %rdx,%r9
> xor %r8d,%ecx xor %r8d,%r8d
> test $0x80,%cl and $0x380,%r9d
> jne 14c5 <blk_rq_merge_ok+0x15> test $0x380,%ecx
> and $0x3,%ch sete %r8b
> jne 14c5 <blk_rq_merge_ok+0x15> cmp %r8,%r9
>
> je 14b5 <blk_rq_merge_ok+0x15>
>
> This saves a branch.
>
> Furthermore, I found that gcc is smart enough to try to optimize the
> code, so if we do
> like this, it will generate the most optimal and smallest code :
>
>
> static inline bool blk_check_merge_flags(unsigned int flags1,
> ¦unsigned int flags2)
> {
> return ((flags1 ^ flags2) &
> (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME))
> == 0;
> }
>
> this gives out :
>
> mov %edx,%r8d
> xor %ecx,%r8d
> and $0x380,%r8d
> jne 14a5 <blk_rq_merge_ok+0x15>
>
> But yes, it compromises readibility.
OK, I'd expect gcc is more clever ;). Thanks for the comparison. Anyway
if that function is performance sensitive, we can use your optimization.
Just add a comment there that we want to check whether the three flags are
the same in both flags and that checking your way generates better code.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/