Re: [PATCH] lib: memcmp optimization
From: Jinpu Wang
Date: Wed Oct 10 2018 - 03:33:21 EST
On Wed, Oct 10, 2018 at 1:13 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 9 Oct 2018 16:28:11 +0200 Jack Wang <xjtuwjp@xxxxxxxxx> wrote:
>
> > From: Florian-Ewald Mueller <florian-ewald.mueller@xxxxxxxxxxxxxxxx>
> >
> > During testing, I have configured 128 md/raid1's and, while under
> > heavy IO, I started a check on each of them
> > (echo check > /sys/block/mdx/md/sync_action).
> >
> > The CPU utilization went through the ceiling and when looking for
> > the cause (with 'perf top'). I've discovered that ~50% of the time
> > was spend in memcmp() called from process_checks().
> >
> > With this patch applied, it drops to 4% - 10%.
>
> Which CPU architecture? Most important architectures appear to define
> __HAVE_ARCH_MEMCMP.
x86_64 is our case, I should have documented it more clearly.
>
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -852,7 +852,7 @@ EXPORT_SYMBOL(memmove);
> > * @count: The size of the area.
> > */
> > #undef memcmp
> > -__visible int memcmp(const void *cs, const void *ct, size_t count)
> > +static inline int __memcmp(const void *cs, const void *ct, size_t count)
>
> What the heck does __visible do?
>
> > {
> > const unsigned char *su1, *su2;
> > int res = 0;
> > @@ -862,6 +862,20 @@ __visible int memcmp(const void *cs, const void *ct, size_t count)
> > break;
> > return res;
> > }
> > +__visible int memcmp(const void *cs, const void *ct, size_t count)
> > +{
> > + const uint64_t *l1p = cs;
> > + const uint64_t *l2p = ct;
> > +
> > + while (count >= sizeof(*l1p)) {
> > + if (*l1p != *l2p)
> > + return __memcmp(l1p, l2p, sizeof(*l1p));
> > + count -= sizeof(*l1p);
> > + ++l1p;
> > + ++l2p;
> > + }
> > + return __memcmp(l1p, l2p, count);
> > +}
>
> This is going to do bad things if the incoming addresses aren't
> suitably aligned.
>
Right, as you said, this is easy to fix.
> Certainly, byte-at-a-time is a pretty lame implementation when the
> addresses are suitably aligned. A fallback to the lame version when
> there is misalignment will be simple to do. And presumably there will
> be decent benefits to whoever is actually using this code. But I'm
> wondering who is actually using this code!
>
I found recent discussion about why x86-64 is using generic string
function here:
+cc John, x86
https://lkml.org/lkml/2018/10/3/818
--
Jack Wang
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 577 008 042
Fax: +49 30 577 008 299
Email: jinpu.wang@xxxxxxxxxxxxxxxx
URL: https://www.profitbricks.de
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
GeschÃftsfÃhrer: Achim Weiss, Matthias Steinberg, Christoph Steffens