RE: [PATCH] add slice by 8 algorithm to crc32.c

From: Joakim Tjernlund
Date: Mon Aug 08 2011 - 03:15:24 EST

Next message: Amerigo Wang: "[PATCH] drm: add missing header file <linuc/types.h>"
Previous message: Joakim Tjernlund: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
In reply to: Joakim Tjernlund: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
Next in thread: George Spelvin: "[PATCH] add slice by 8 algorithm to crc32.c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Bob Pearson" <rpearson@xxxxxxxxxxxxxxxxxxxxx> wrote on 2011/08/05 19:27:26:
>
> > >
> > > >
> > > > >
> > > > > Modify all 'i' loops from for (i = 0; i < foo; i++) { ... } to for
> (i =
> > > foo
> > > > > - 1; i >= 0; i--) { ... }
> > > >
> > > > That should be (i = foo; i ; --i) { ... }
> > >
> > > Shouldn't make much difference, branch on zero bit or branch on sign
> bit.
> > > But at the end of the day didn't help on Nehalem.
>
> I figured out why "for (i = 0; i < len; i++) {...}" is faster than "for (;
> len; len--) {...}" on my system.
> The current code is
>
> for (; Ien; len--) {
> load *++p
> ...
> }
>
> Which turns into (in fake assembly)
>
> top:
> dec len
> inc p
> load p
> ...
> test len
> branch neq top
>
> But when I replace that with
>
> for(i = 0; i < len; i++) {
> load *++p
> ...
> }
>
> Gcc turns it into
>
> top:
> load p[i]
> i++
> ...
> compare i, len
> branch lt top
>
> which is fewer instructions and i++ is well scheduled. Incrementing the
> pointer has been moved out of the loop.

I see. Lets leave the pre vs. post inc. for now. That is something
that can be sorted separately.

Jocke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Amerigo Wang: "[PATCH] drm: add missing header file <linuc/types.h>"
Previous message: Joakim Tjernlund: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
In reply to: Joakim Tjernlund: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
Next in thread: George Spelvin: "[PATCH] add slice by 8 algorithm to crc32.c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]