Solving M produces N consumers scalability problem

From: Anatol Pomozov
Date: Fri Nov 01 2013 - 16:48:18 EST


Hi,

I am looking at fuse scalabily issue that I hit recently on my
multi-socket server machines
http://permalink.gmane.org/gmane.comp.file-systems.fuse.devel/13490

The short description is that we have several thousands threads that
make filesystem access to a fuse fs. Fuse spawns several threads
(usually 4-8) that read the requests from kernel process it and write
response back to the kernel.

The problem is that fuse kernel module uses a global list where it
keeps all active requests. But both consumers and producers are at
different CPU's and getting lock to access the list is very expensive
operation. I have a test case that shows ~35% of the system time is
spent in _raw_spin_lock when accessing to this global list. I want to
solve this scalability problem.

One idea is not to use the spin_lock. It is the 'fair spin_lock' that
has scalability problems
http://pdos.csail.mit.edu/papers/linux:lock.pdf Maybe lockless
datastructures can help here?

Another idea is avoid global datasctructures but I have a few
questions here. Let's say we want to use per-CPU lists. But the
problem is that producers/consumers are not distributed across all
CPUs. Some CPU might have too many producers, some other might not
have consumers at all. So we need some kind of migration from hot CPU
to the cold one. What is the best way to achieve it? Are there any
examples how to do this? Any other ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/