Re: [PATCH v3 05/46] perf/x86/intel/cmt: add per-package locks

From: David Carrillo-Cisneros
Date: Thu Nov 10 2016 - 21:22:24 EST


>> To circumvent this problem, statically define CMT_MAX_NR_PKGS number of
>> lock_class_key's.
>
> That's a proper circumvention. Define a random number on your own.

A judiciously chosen number ;) .

>
>> Additional details in code's comments.
>
> This is
>
> - pointless. Either there are comments or not. I can see that from
> the patch.
>
> - wrong. Because for this crucial detail of it (the lockdep
> restriction) there is no single comment in the code.

Noted. I'll stop adding those notes to the changelogs.

>> +/* Increase as needed as Intel CPUs grow. */
>
> As CPUs grow? CMT is a per L3 cache domain, i.e. package, which is
> equivivalent to socket today..
>
>> +#define CMT_MAX_NR_PKGS 8
>
> This already breaks on these shiny new 16 socket machines companies are
> working on. Not to talk about the 4096 core monstrosities which are shipped
> today, which have definitely more than 16 sockets already.
>
> And now the obvious question: How are you going to fix this?

A possible alternative to nested spinlocks is to use a single rw_lock to
protect changes to the monr hierarchy.

This works because, when manipulating pmonrs, the monr hierarchy (monr's
parent and children) is read but never changed (reader path).
Changes to the monr hierarchy do change pmonrs (writer path).

It would be implemented like:

// system-wide lock for monr hierarchy
rwlock_t monr_hrchy_rwlock = __RW_LOCK_UNLOCKED(monr_hrchy_rwlock);


Reader Path: Access pmonrs in same pkgd. Used in pmonr state transitions
and when reading cgroups (read subtree of pmonrs in same package):

v3 (old):

raw_spin_lock(&pkgd->lock);
// read monr data, read/write pmonr data
raw_spin_unlock(&pkgd->lock);

next version:

read_lock(&monr_hrchy_rwlock);
raw_spin_lock(&pkgd->lock);
// read monr data, read/write pmonr data
raw_spin_unlock(&pkgd->lock);
read_unlock(&monr_hrchy_rwlock);


Writer Path: Modifies monr hierarchy (add/remove monr in
creation/destruction of events and start/stop monitoring
of cgroups):

v3 (old):

monr_hrchy_acquire_locks(...); // acquires all pkg->lock
// read/write monr data, potentially read/write pmonr data
monr_hrchy_release_locks(...); // release all pkg->lock

v4:

write_lock(&monr_hrchy_rwlock);
// read/write monr data, potentially read/write pmonr data
write_unlock(&monr_hrchy_rwlock);


The write-path should occur infrequently. The reader-path
is slower due to the additional read_lock(&monr_hrchy_rwlock).
The number of locks to acquire remains constant on the number of
packages in both paths. There would be no need to ever nest pkgd->lock's.

For v3, I discarded this idea because, due to the additional
read_lock(&monr_hrchy_rwlock), the pmonr state transitions that may
occur in a context switch would be slower than in the current version.
But it may be ultimately better considering the scalability issues
that you mention.


>> + * - Hold pkg->mutex and pkg->lock in _all_ active packages to traverse or
>> + * change the monr hierarchy.
>
> So if you hold both locks in 8 packages then you already have 16 locks held
> nested. Now double that with 16 packages and you are slowly approaching the
> maximum lock chain depth lockdep can cope with. Now think SGI and guess
> how that works.
>
> I certainly can understand why you want to split the locks, but my gut
> feeling tells me that while it's probably not a big issue on a 2/4 socket
> machines it will go down the drain rather fast on anything real big.
>
> Thanks,
>
> tglx

Thanks for reviewing the patches. I really appreciate the feedback.

David