Re: [PATCH] reorder struct files_struct

From: Eric Dumazet
Date: Wed Sep 14 2005 - 17:42:29 EST


Dipankar Sarma a écrit :
On Wed, Sep 14, 2005 at 11:17:42PM +0200, Eric Dumazet wrote:

In SMP (and NUMA) environnements, each time a thread wants to open or close a file, it has to acquire the spinlock, thus invalidating the cache line containing this spinlock on other CPUS. So other threads doing read()/write()/... calls that use RCU to access the file table are going to ask further memory (possibly NUMA) transactions to read again this memory line.

Please consider applying this patch. It moves the spinlock to another cache line, so that concurrent threads can share the cache line containing 'count' and 'fdt' fields.

--- linux-2.6.14-rc1/include/linux/file.h 2005-09-13 05:12:09.000000000 +0200
+++ linux-2.6.14-rc1-ed/include/linux/file.h 2005-09-15 01:09:13.000000000 +0200
@@ -34,12 +34,12 @@
*/
struct files_struct {
atomic_t count;
- spinlock_t file_lock; /* Protects all the below members. Nests inside tsk->alloc_lock */
struct fdtable *fdt;
struct fdtable fdtab;
fd_set close_on_exec_init;
fd_set open_fds_init;
struct file * fd_array[NR_OPEN_DEFAULT];
+ spinlock_t file_lock; /* Protects concurrent writers. Nests inside tsk->alloc_lock */
};

#define files_fdtable(files) (rcu_dereference((files)->fdt))


For most apps without too many open fds, the embedded fd_sets
are going to be used. Wouldn't that mean that open()/close() will
invalidate the cache line containing fdt, fdtab by updating
the fd_sets ? If so, you optimization really doesn't help.


If the embedded struct fdtable is used, then the only touched field is 'next_fd', so we could also move this field at the end of 'struct fdtable'

But I wonder if 'next_fd' really has to be in 'struct fdtable', maybe it could be moved to 'struct files_struct' close to file_lock ?

If yes, the whole embedded struct fdtable is readonly.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/