Lock contention around unix_gc_lock

From: Ivan Babrou
Date: Tue Dec 10 2019 - 16:32:40 EST

Next message: Sasha Levin: "[PATCH AUTOSEL 4.19 013/177] ath10k: fix backtrace on coredump"
Previous message: Sasha Levin: "[PATCH AUTOSEL 4.19 006/177] iio: tcs3414: fix iio_triggered_buffer_{pre,post}enable positions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

We're seeing very high contention on unix_gc_lock when a bug in an
application makes it stop reading incoming messages with inflight unix
sockets. In our system we churn through a lot of unix sockets and we
have 96 logical CPUs in the system, so spinlock gets very hot.

I was able to halve overall system throughput with 1024 inflight unix
sockets, which is the default RLIMIT_NOFILE. This doesn't sound too
good for isolation, one user should not be able to affect the system
as much. One might even consider this as DoS vector.

There's a lot of time is spent in _raw_spin_unlock_irqrestore, which
is triggered by wait_for_unix_gc, which in turn is unconditionally
called from unix_stream_sendmsg:

ffffffff9f64f3ea _raw_spin_unlock_irqrestore+0xa
ffffffff9eea6ab0 prepare_to_wait_event+0x70
ffffffff9f5a4ac6 wait_for_unix_gc+0x76
ffffffff9f5a182c unix_stream_sendmsg+0x3c
ffffffff9f4bb7f9 sock_sendmsg+0x39

* https://elixir.bootlin.com/linux/v4.19.80/source/net/unix/af_unix.c#L1849

Even more time is spent in waiting on spinlock because of call to
unix_gc from unix_release_sock, where condition is having any inflight
sockets whatsoever:

ffffffff9eeb1758 queued_spin_lock_slowpath+0x158
ffffffff9f5a4718 unix_gc+0x38
ffffffff9f5a28f3 unix_release_sock+0x2b3
ffffffff9f5a2929 unix_release+0x19
ffffffff9f4b902d __sock_release+0x3d
ffffffff9f4b90a1 sock_close+0x11

* https://elixir.bootlin.com/linux/v4.19.80/source/net/unix/af_unix.c#L586

Should this condition take the number of inflight sockets into
account, just like unix_stream_sendmsg does via wait_for_unix_gc?

Static number of inflight sockets that trigger a GC from
wait_for_unix_gc may also be something that is scaled with system
size, rather than be a hardcoded value.

I know that our case is a pathological one, but it sounds like
scalability of garbage collection can be better, especially on systems
with large number of CPUs.

Next message: Sasha Levin: "[PATCH AUTOSEL 4.19 013/177] ath10k: fix backtrace on coredump"
Previous message: Sasha Levin: "[PATCH AUTOSEL 4.19 006/177] iio: tcs3414: fix iio_triggered_buffer_{pre,post}enable positions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]