Re: [PATCH] use unfair spinlock when running on hypervisor.

From: Nick Piggin
Date: Thu Jun 03 2010 - 09:45:16 EST


On Thu, Jun 03, 2010 at 06:28:21PM +0530, Srivatsa Vaddagiri wrote:
> On Thu, Jun 03, 2010 at 10:38:32PM +1000, Nick Piggin wrote:
> > Holding a ticket in the queue is effectively the same as holding the
> > lock, from the pov of processes waiting behind.
> >
> > The difference of course is that CPU cycles do not directly reduce
> > latency of ticket holders (only the owner). Spinlock critical sections
> > should tend to be several orders of magnitude shorter than context
> > switch times. So if you preempt the guy waiting at the head of the
> > queue, then it's almost as bad as preempting the lock holder.
>
> Ok got it - although that approach is not advisable in some cases for ex: when
> the lock holder vcpu and lock acquired vcpu are scheduled on the same pcpu by
> the hypervisor (which was experimented with in [1] where they foud a huge hit in
> perf).

Sure but if you had adaptive yielding, that solves that problem.


> I agree that in general we should look at deferring preemption of lock
> acquirer esp when its at "head" as you suggest - I will consider that approach
> as the next step (want to incrementally progress basically!).
>
> > > > Have you also looked at how s390 checks if the owning vcpu is running
> > > > and if so it spins, if not yields to the hypervisor. Something like
> > > > turning it into an adaptive lock. This could be applicable as well.
> > >
> > > I don't think even s390 does adaptive spinlocks. Also afaik s390 zVM does gang
> > > scheduling of vcpus, which reduces the severity of this problem very much -
> > > essentially lock acquirer/holder are run simultaneously on different cpus all
> > > the time. Gang scheduling is on my list of things to look at much later
> > > (although I have been warned that its a scalablility nightmare!).
> >
> > It effectively is pretty well an adaptive lock. The spinlock itself
> > doesn't sleep of course, but it yields to the hypervisor if the owner
> > has been preempted. This is pretty close to analogous with Linux adaptive mutexes.
>
> Oops you are right - sorry should have checked more closely earlier. Given that
> we may not be able to always guarantee that locked critical sections will not be
> preempted (ex: when a real-time task takes over), we will need a combination of
> both approaches (i.e request preemption defer on lock hold path + yield on lock
> acquire path if owner !scheduled). The advantage of former approach is that it
> could reduce job turnaround times in most cases (as lock is available when we
> want or we don't have to wait too long for it).

Both I think would be good. It might be interesting to talk with the
s390 guys and see if they can look at ticket locks and preempt defer
techniques too (considering they already do the other half of the
equation well).


> > s390 also has the diag9c instruction which I suppose somehow boosts
> > priority of a preempted contended lock holder. In spite of any other
> > possible optimizations in their hypervisor like gang scheduling,
> > diag9c apparently provides quite a large improvement in some cases.
>
> Ok - thx for that pointer - will have a look at diag9c.
>
> > So I think these things are fairly important to look at.
>
> I agree ..
>
> - vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/