RE: [PATCH v2] MIPS: Check __clang__ to avoid performance influence with GCC in csum_tcpudp_nofold()

From: David Laight
Date: Mon Mar 15 2021 - 08:42:44 EST


From: Tiezhu Yang <yangtiezhu@xxxxxxxxxxx>
> Sent: 15 March 2021 12:26
> On 03/15/2021 04:49 AM, Maciej W. Rozycki wrote:
> > On Tue, 9 Mar 2021, Tiezhu Yang wrote:
> >
> >> diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
> >> index 1e6c135..80eddd4 100644
> >> --- a/arch/mips/include/asm/checksum.h
> >> +++ b/arch/mips/include/asm/checksum.h
> >> @@ -128,9 +128,13 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> >>
> >> static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
> >> __u32 len, __u8 proto,
> >> - __wsum sum)
> >> + __wsum sum_in)
> >> {
> >> - unsigned long tmp = (__force unsigned long)sum;
> >> +#ifdef __clang__
> >> + unsigned long sum = (__force unsigned long)sum_in;
> >> +#else
> >> + __wsum sum = sum_in;
> >> +#endif
> > This looks much better to me, but I'd keep the variable names unchanged
> > as `sum_in' isn't used beyond the initial assignment anyway (you'll have
> > to update the references with asm operands accordingly of course).
> >
> > Have you verified that code produced by GCC remains the same with your
> > change in place as it used to be up to commit 198688edbf77? I can see no
> > such information in the commit description whether here or in the said
> > commit.
> >
> > Maciej
>
> Hi Maciej,
>
> Thanks for your reply.
>
> gcc --version
> gcc (Debian 10.2.1-6) 10.2.1 20210110
>
> net/ipv4/tcp_ipv4.c
> tcp_v4_send_reset()
> csum_tcpudp_nofold()
>
> objdump -d vmlinux > vmlinux.dump
>
> (1) Before commit 198688edbf77
> ("MIPS: Fix inline asm input/output type mismatch in checksum.h used
> with Clang"):
>
> ffffffff80aa835c: 00004025 move a4,zero
> ffffffff80aa8360: 92020012 lbu v0,18(s0)
> ffffffff80aa8364: de140030 ld s4,48(s0)
> ffffffff80aa8368: 0064182d daddu v1,v1,a0
> ffffffff80aa836c: 304200ff andi v0,v0,0xff
> ffffffff80aa8370: 9c64000c lwu a0,12(v1)
> ffffffff80aa8374: 9c660010 lwu a2,16(v1)
> ffffffff80aa8378: afa70038 sw a3,56(sp)
> ffffffff80aa837c: 24071a00 li a3,6656
> ffffffff80aa8380: 0086202d daddu a0,a0,a2
> ffffffff80aa8384: 0087202d daddu a0,a0,a3
> ffffffff80aa8388: 0088202d daddu a0,a0,a4
> ffffffff80aa838c: 0004083c dsll32 at,a0,0x0
> ffffffff80aa8390: 0081202d daddu a0,a0,at
> ffffffff80aa8394: 0081082b sltu at,a0,at
> ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0
> ffffffff80aa839c: 00812021 addu a0,a0,at
>
> (2) After commit 198688edbf77
> ("MIPS: Fix inline asm input/output type mismatch in checksum.h used
> with Clang"):
>
> ffffffff80aa836c: 00004025 move a4,zero
> ffffffff80aa8370: 92040012 lbu a0,18(s0)
> ffffffff80aa8374: de140030 ld s4,48(s0)
> ffffffff80aa8378: 0062182d daddu v1,v1,v0
> ffffffff80aa837c: 308400ff andi a0,a0,0xff
> ffffffff80aa8380: 9c62000c lwu v0,12(v1)
> ffffffff80aa8384: 9c660010 lwu a2,16(v1)
> ffffffff80aa8388: afa70038 sw a3,56(sp)
> ffffffff80aa838c: 24071a00 li a3,6656
> ffffffff80aa8390: 0046102d daddu v0,v0,a2
> ffffffff80aa8394: 0047102d daddu v0,v0,a3
> ffffffff80aa8398: 0048102d daddu v0,v0,a4
> ffffffff80aa839c: 0002083c dsll32 at,v0,0x0
> ffffffff80aa83a0: 0041102d daddu v0,v0,at
> ffffffff80aa83a4: 0041082b sltu at,v0,at
> ffffffff80aa83a8: 0002103f dsra32 v0,v0,0x0
> ffffffff80aa83ac: 00411021 addu v0,v0,at
>
> (3) With this patch:
>
> ffffffff80aa835c: 00004025 move a4,zero
> ffffffff80aa8360: 92020012 lbu v0,18(s0)
> ffffffff80aa8364: de140030 ld s4,48(s0)
> ffffffff80aa8368: 0064182d daddu v1,v1,a0
> ffffffff80aa836c: 304200ff andi v0,v0,0xff
> ffffffff80aa8370: 9c64000c lwu a0,12(v1)
> ffffffff80aa8374: 9c660010 lwu a2,16(v1)
> ffffffff80aa8378: afa70038 sw a3,56(sp)
> ffffffff80aa837c: 24071a00 li a3,6656
> ffffffff80aa8380: 0086202d daddu a0,a0,a2
> ffffffff80aa8384: 0087202d daddu a0,a0,a3
> ffffffff80aa8388: 0088202d daddu a0,a0,a4
> ffffffff80aa838c: 0004083c dsll32 at,a0,0x0
> ffffffff80aa8390: 0081202d daddu a0,a0,at
> ffffffff80aa8394: 0081082b sltu at,a0,at
> ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0
> ffffffff80aa839c: 00812021 addu a0,a0,at
>
> (4) With the following changes based on commit 198688edbf77
> ("MIPS: Fix inline asm input/output type mismatch in checksum.h used
> with Clang"):
>
> diff --git a/arch/mips/include/asm/checksum.h
> b/arch/mips/include/asm/checksum.h
> index 1e6c135..e1f80407 100644
> --- a/arch/mips/include/asm/checksum.h
> +++ b/arch/mips/include/asm/checksum.h
> @@ -130,7 +130,11 @@ static inline __wsum csum_tcpudp_nofold(__be32
> saddr, __be32 daddr,
> __u32 len, __u8 proto,
> __wsum sum)
> {
> +#ifdef __clang__
> unsigned long tmp = (__force unsigned long)sum;
> +#else
> + __wsum tmp = sum;
> +#endif
>
> __asm__(
> " .set push # csum_tcpudp_nofold\n"
>
> ffffffff80aa835c: 00004025 move a4,zero
> ffffffff80aa8360: 92020012 lbu v0,18(s0)
> ffffffff80aa8364: de140030 ld s4,48(s0)
> ffffffff80aa8368: 0064182d daddu v1,v1,a0
> ffffffff80aa836c: 304200ff andi v0,v0,0xff
> ffffffff80aa8370: 9c64000c lwu a0,12(v1)
> ffffffff80aa8374: 9c660010 lwu a2,16(v1)
> ffffffff80aa8378: afa70038 sw a3,56(sp)
> ffffffff80aa837c: 24071a00 li a3,6656
> ffffffff80aa8380: 0086202d daddu a0,a0,a2
> ffffffff80aa8384: 0087202d daddu a0,a0,a3
> ffffffff80aa8388: 0088202d daddu a0,a0,a4
> ffffffff80aa838c: 0004083c dsll32 at,a0,0x0
> ffffffff80aa8390: 0081202d daddu a0,a0,at
> ffffffff80aa8394: 0081082b sltu at,a0,at
> ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0
> ffffffff80aa839c: 00812021 addu a0,a0,at
>
> The code produced by GCC remains the same between (1), (3) and (4),
> the last changes looks like better (with less changes based on commit
> 198688edbf77), so I will send v3 later.

Aren't those all the same - apart from register selection.
Not that I grok the mips opcodes.
But that code has horridness on its side.

The only obvious difference is that something else changes the
code offset from xxxx5c to xxxx6c.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)