Re: [PATCH v4 00/30] NT synchronization primitive driver

From: Peter Zijlstra
Date: Tue Apr 16 2024 - 11:50:45 EST


On Tue, Apr 16, 2024 at 10:14:21AM +0200, Peter Zijlstra wrote:

> > Some aspects of the implementation may deserve particular comment:
> >
> > * In the interest of performance, each object is governed only by a single
> > spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
> > objects be changed as a single atomic operation. In order to achieve this, we
> > first take a device-wide lock ("wait_all_lock") any time we are going to lock
> > more than one object at a time.
> >
> > The maximum number of objects that can be used in a vectored wait, and
> > therefore the maximum that can be locked simultaneously, is 64. This number is
> > NT's own limit.

AFAICT:

spin_lock(&dev->wait_all_lock);
list_for_each_entry(entry, &obj->all_waiters, node)
for (i=0; i<count; i++)
spin_lock_nest_lock(q->entries[i].obj->lock, &dev->wait_all_lock);

Where @count <= NTSYNC_MAX_WAIT_COUNT.

So while this nests at most 65 spinlocks, there is no actual bound on
the amount of nested lock sections in total. That is, all_waiters list
can be grown without limits.

Can we pretty please make wait_all_lock a mutex ?

> > The acquisition of multiple spinlocks will degrade performance. This is a
> > conscious choice, however. Wait-for-all is known to be a very rare operation
> > in practice, especially with counts that approach the maximum, and it is the
> > intent of the ntsync driver to optimize wait-for-any at the expense of
> > wait-for-all as much as possible.

Typical sane usage is a good guide for performance, but you must not
forget about malicious userspace and what they can do on purpose to mess
you up.


Anyway, let me stare more at all this....