Re: [RFC][PATCH] Fix file counting

From: Eric Dumazet
Date: Thu Feb 16 2006 - 11:09:45 EST


Dipankar Sarma a écrit :
Eric,

Going by your patch, I converted my nr-files patch to use
percpu counters - except that I just used the existing
percpu counter code. This patch is untested, just for comments.
If you agree with the approach, we can go with it. I will
get some benchmark numbers measured on a 16-CPU box.

Thanks
Dipankar



The way we do file struct accounting is not very suitable for batched
freeing. For scalability reasons, file accounting was constructor/destructor
based. This meant that nr_files was decremented only when
the object was removed from the slab cache. This is
susceptible to slab fragmentation. With RCU based file structure,
consequent batched freeing and a test program like Serge's,
we just speed this up and end up with a very fragmented slab -

llm22:~ # cat /proc/sys/fs/file-nr
587730 0 758844

At the same time, I see only a 2000+ objects in filp cache.
The following patch I fixes this problem.

This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api. In the sysctl
handler for nr_files, we populate files_stat.nr_files before returning
to user.

Counting files as an when they are created and destroyed (as opposed
to inside slab) allows us to correctly count open files with RCU.

Signed-off-by: Dipankar Sarma <dipankar@xxxxxxxxxx>
---




fs/dcache.c | 2 -
fs/file_table.c | 80 +++++++++++++++++++++++++++++++--------------------
include/linux/file.h | 2 -
include/linux/fs.h | 2 +
kernel/sysctl.c | 5 ++-
net/unix/af_unix.c | 2 -
6 files changed, 57 insertions(+), 36 deletions(-)


Hi Dipankar

I believe your patch is good.

However there is a bug in get_empty_filp() :

if security_file_alloc(f) returns TRUE : The goto fail_sec; is done and fail_sec: does a file_free(f)

This means an extra percpu_counter_mod(&nr_files, -1L); will be done.

So I think you must apply this patch too :

fail_sec:
- file_free(f);
+ kmem_cache_free(filp_cachep, f);
fail:
return NULL;


(Ie no need for a delayed free done via rcu here, just a direct kmem_cache_free() call)

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/