Re: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation

From: Michal Simek
Date: Mon Feb 28 2022 - 01:38:27 EST

Next message: Srinivasa Rao Mandadapu: "[PATCH v5 0/2] Add support for SoundWire1.6 audio cgcr register control"
Previous message: Muchun Song: "[PATCH v3 6/6] mm: remove range parameter from follow_invalidate_pte()"
In reply to: David Laight: "RE: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation"
Next in thread: Michal Simek: "[PATCH v2 3/3] microblaze: Use simple memmove/memcpy implementation from lib/string.c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/25/22 22:50, David Laight wrote:

From: Michal Simek

Sent: 25 February 2022 13:56

Align implementation with memcpy and memmove where also remaining bytes are
copied via final switch case instead of using simple implementations which
loop. But this alignment has much stronger reason and definitely aligning
implementation is not the key point here. It is just good to have in mind
that the same technique is used already there.

In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
This optimization causes GCC to convert the while loop in memset.c into a
call to memset.

Gah...
That is nearly as brain dead as another compiler that would convert
any byte copy loop (on x86) into 'rep movsb'.

If I want to call memcpy() I'll call memcpy.
If I'm copying a few bytes I might write the loop to avoid
the cost of the call and all the conditional tests for
buffer length and alignment.

Don't the compiler writers have better things to do?

Not sure what you want me to say about it. It is current gcc behavior and I can't see the way back. I don't think doing loop unrolling here is a big deal for me because the same technique is used for years in memcpy and memmove.

Thanks,
Michal

Next message: Srinivasa Rao Mandadapu: "[PATCH v5 0/2] Add support for SoundWire1.6 audio cgcr register control"
Previous message: Muchun Song: "[PATCH v3 6/6] mm: remove range parameter from follow_invalidate_pte()"
In reply to: David Laight: "RE: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation"
Next in thread: Michal Simek: "[PATCH v2 3/3] microblaze: Use simple memmove/memcpy implementation from lib/string.c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]