Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

From: Ingo Molnar
Date: Tue Feb 09 2016 - 06:24:11 EST

Next message: Ingo Molnar: "Re: [PATCH v2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy"
Previous message: Julien Grall: "Re: [PATCH 3/5] irqchip/gic-v2: Parse and export virtual GIC information"
Next in thread: Will Deacon: "Re: [RFC][PATCH] mips: Fix arch_spin_unlock()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Will Deacon <will.deacon@xxxxxxx> wrote:

> On Wed, Feb 03, 2016 at 01:32:10PM +0000, Will Deacon wrote:
> > On Wed, Feb 03, 2016 at 09:33:39AM +0100, Ingo Molnar wrote:
> > > In fact I'd suggest to test this via a quick runtime hack like this in rcupdate.h:
> > >
> > > extern int panic_timeout;
> > >
> > > ...
> > >
> > > if (panic_timeout)
> > > smp_load_acquire(p);
> > > else
> > > typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p);
> > >
> > > (or so)
> >
> > So the problem with this is that a LOAD <ctrl> LOAD sequence isn't an
> > ordering hazard on ARM, so you're potentially at the mercy of the branch
> > predictor as to whether you get an acquire. That's not to say it won't
> > be discarded as soon as the conditional is resolved, but it could
> > screw up the benchmarking.
> >
> > I'd be better off doing some runtime patching, but that's not something
> > I can knock up in a couple of minutes (so I'll add it to my list).
>
> ... so I actually got that up and running, believe it or not. Filthy stuff.

Wow!

I tried to implement the simpler solution by hacking rcupdate.h, but got drowned
in nasty circular header file dependencies and gave up...

If you are not overly embarrassed by posting hacky patches, mind posting your
solution?

> The good news is that you're right, and I'm now seeing ~1% difference between
> the runs with ~0.3% noise for either of them. I still think that's significant,
> but it's a lot more reassuring than 4%.

hm, so for such marginal effects I think we could improve the testing method a
bit: we could improve 'perf bench sched messaging' to allow 'steady state
testing': to not exit+restart all the processes between test iterations, but to
continuously measure and print out current performance figures.

I.e. every 10 seconds it could print a decaying running average of current
throughput.

That way you could patch/unpatch the instructions without having to restart the
tasks. If you still see an effect (in the numbers reported every 10 seconds), then
that's a guaranteed result.

[ We have such functionality in 'perf bench numa' (the --show-convergence option),
for similar reasons, to allow runtime monitoring and tweaking of kernel
parameters. ]

Thanks,

Ingo

Next message: Ingo Molnar: "Re: [PATCH v2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy"
Previous message: Julien Grall: "Re: [PATCH 3/5] irqchip/gic-v2: Parse and export virtual GIC information"
Next in thread: Will Deacon: "Re: [RFC][PATCH] mips: Fix arch_spin_unlock()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]