Re: [PATCH v2] arm: Added support for getcpu() vDSO using TPIDRURW
From: Will Deacon
Date: Mon Oct 10 2016 - 11:30:48 EST
[adding Mathieu -- background is getcpu() in userspace for arm]
On Thu, Oct 06, 2016 at 12:17:07AM +0200, Fredrik Markström wrote:
> On Wed, Oct 5, 2016 at 9:53 PM, Russell King - ARM Linux <linux@xxxxxxxxxxxxxxx
> > wrote:
> > On Wed, Oct 05, 2016 at 06:48:05PM +0100, Robin Murphy wrote:
> >> On 05/10/16 17:39, Fredrik Markström wrote:
> >> > The approach I suggested below with the vDSO data page will obviously
> >> > not work on smp, so suggestions are welcome.
> >> Well, given that it's user-writeable, is there any reason an application
> >> which cares couldn't simply run some per-cpu threads to call getcpu()
> >> once and cache the result in TPIDRURW themselves? That would appear to
> >> both raise no compatibility issues and work with existing kernels.
> > There is - the contents of TPIDRURW is thread specific, and it moves
> > with the thread between CPU cores. So, if a thread was running on CPU0
> > when it cached the getcpu() value in TPIDRURW, and then migrated to CPU1,
> > TPIDRURW would still contain 0.
> > I'm also not in favour of changing the TPIDRURW usage to be a storage
> > repository for the CPU number - it's far too specific a usage and seems
> > like a waste of hardware resources to solve one problem.
> Ok, but right now it's nothing but a (architecture specific) piece of TLS,
> which we have generic mechanisms for. From my point of view that is a waste of
> hardware resources.
> > As Mark says, it's an ABI breaking change too, even if it is under a config
> I can't argue with that. If it's an ABI it's an ABI, even if I can't imagine
> why anyone would use it over normal tls... but then again, people probably do.
> So in conclusion I agree and give up.
Rather than give up, you could take a look at the patches from Mathieu
Desnoyers, that tackle this in a very different way. It's also the reason
we've been holding off implementing an optimised getcpu in the arm64 vdso,
because it might all well be replaced by the new restartable sequences
He's also got support for arch/arm/ in that series, so you could take
them for a spin. The main thing missing at the moment is justification
for the feature using real-world code, as requested by Linus:
so if your per-cpu buffer use-case is compelling in its own right (as
opposed to a micro-benchmark), then you could chime in over there.