Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

From: Neil Horman
Date: Thu Oct 31 2013 - 10:33:40 EST


On Thu, Oct 31, 2013 at 11:22:00AM +0100, Ingo Molnar wrote:
>
> * Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
>
> > > etc. For such short runtimes make sure the last column displays
> > > close to 100%, so that the PMU results become trustable.
> > >
> > > A nehalem+ PMU will allow 2-4 events to be measured in parallel,
> > > plus generics like 'cycles', 'instructions' can be added 'for free'
> > > because they get counted in a separate (fixed purpose) PMU register.
> > >
> > > The last colum tells you what percentage of the runtime that
> > > particular event was actually active. 100% (or empty last column)
> > > means it was active all the time.
> > >
> > > Thanks,
> > >
> > > Ingo
> > >
> >
> > Hmm,
> >
> > I ran this test:
> >
> > for i in `seq 0 1 3`
> > do
> > echo $i > /sys/module/csum_test/parameters/module_test_mode
> > taskset -c 0 perf stat --repeat 20 -C 0 -e L1-dcache-load-misses -e L1-dcache-prefetches -e cycles -e instructions -ddd ./test.sh
> > done
>
> You need to remove '-ddd' which is a shortcut for a ton of useful
> events, but here you want to use fewer events, to increase the
> precision of the measurement.
>
> Thanks,
>
> Ingo
>

Thank you ingo, that fixed it. I'm trying some other variants of the csum
algorithm that Doug and I discussed last night, but FWIW, the relative
performance of the 4 test cases (base/prefetch/parallel/both) remains unchanged.
I'm starting to feel like at this point, theres very little point in doing
parallel alu operations (unless we can find a way to break the dependency on the
carry flag, which is what I'm tinkering with now).
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/