Re: very large directories

From: James R Bruce (bruce+@andrew.cmu.edu)
Date: Wed Mar 01 2000 - 17:24:06 EST


Excerpts from internet.computing.linux-kernel: 1-Mar-100 Re: very large
directories by nbecker@fred.net
> It appears to me that there is a kernel design problem. If all that
> happens when someone creates very large directories is that their
> application runs slow, we could argue that that was just a user
> programming issue. But it appears to me (based on only nonscientific
> observation) that not only does that app slow down. I suspect that we
> are spending a large amount of time in non-preemptable kernel. I say
> this because when I run on my 400MHz i686 I am usually not bothered by
> a second user running some process. In this case the load is very
> noticable.

There are a lot of ways to annoy another user. One of the principal
design characteristics of unices in genereal is that you optimize for
the common case, and assume non-adversarial users with respect to
performance. Another way to annoy users is to memory mapping huge
buffers (clogs up non pagable physical memory, at least in early 2.x
kernels).

> If I am correct that a lot of time is being spent in the kernel (I'm
> only guessing at this, so tell me if I'm wrong) then it would appear
> to me that this is not just a user space issue - but really a weakness
> in the kernel design. If so perhaps higher priority should be given
> to correcting it.

96% system according to top.

I used the program listed below to create and delete a bunch of files in
a single directory. While the time increases linearly, the time per
file is still under 0.2ms for each file created in a directory of 3200
files (P166, 32MB ram, kernel 2.2.14).

Furthermore, I didn't find the system unresponsive. Your problems may
have more to do with the apps writing large buffers to the files and
flushing the page cache than the file lookup itself. If you really
think this is still a problem, try putting your users' home directories
on ReiserFS. You might find that the system isn't any better however,
since now they'd be accessing files at 2500 per/sec rather than 1500,
effectively tying up the same kernel resources during program execution.

Ext2fs works well in the vast majority of what is experienced, and I
think most people want it to be as simple and bug-free as possible in
providing for that common case. If someone wants to add b-trees or
other nifty features, that's fine. You raise an important issue for
certain circumstances, but the solutions represent added complexity with
little gain for the average user, so I don't think its going to be all
that high on people's priority list.

==== test output ====
> ../test 100
Files: 100
  Create: 0.043s total 0.00043s each... Delete: 0.009s total 0.00009s each
  Create: 0.043s total 0.00043s each... Delete: 0.009s total 0.00009s each
  Create: 0.040s total 0.00040s each... Delete: 0.009s total 0.00009s each
  Create: 0.040s total 0.00040s each... Delete: 0.009s total 0.00009s each
> ../test 200
Files: 200
  Create: 0.049s total 0.00024s each... Delete: 0.018s total 0.00009s each
  Create: 0.049s total 0.00024s each... Delete: 0.018s total 0.00009s each
  Create: 0.048s total 0.00024s each... Delete: 0.018s total 0.00009s each
  Create: 0.046s total 0.00023s each... Delete: 0.018s total 0.00009s each
> ../test 800
Files: 800
  Create: 0.323s total 0.00040s each... Delete: 0.095s total 0.00012s each
  Create: 0.325s total 0.00041s each... Delete: 0.095s total 0.00012s each
  Create: 0.323s total 0.00040s each... Delete: 0.095s total 0.00012s each
  Create: 0.323s total 0.00040s each... Delete: 0.095s total 0.00012s each
> ../test 3200
Files: 3200
  Create: 4.405s total 0.00138s each... Delete: 0.765s total 0.00024s each
  Create: 4.415s total 0.00138s each... Delete: 0.766s total 0.00024s each
  Create: 4.444s total 0.00139s each... Delete: 0.869s total 0.00027s each
  Create: 4.427s total 0.00138s each... Delete: 0.763s total 0.00024s each

==== test.c ====
#include <sys/types.h>
#include <sys/time.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main(int argc,char **argv)
{
  int files;

  int i,t,fd;
  char name[32];

  struct timeval tv1,tv2;
  double d;

  if(argc != 2) exit(1);
  files = atoi(argv[1]);

  printf("Files: %d\n",files);

  for(t=0; t<4; t++){

    // Create a bunch of files
    gettimeofday(&tv1,NULL);
    for(i=0; i<files; i++){
      sprintf(name,"%05d.tmp",i);
      fd = open(name,O_CREAT|O_WRONLY);
      close(fd);
    }
    gettimeofday(&tv2,NULL);
    d = (tv2.tv_sec - tv1.tv_sec) + (tv2.tv_usec - tv1.tv_usec)/1.0E6;
    printf(" Create: %6.3fs total %7.5fs each... ",d,d/files);

    // Delete a bunch of files
    gettimeofday(&tv1,NULL);
    for(i=0; i<files; i++){
      sprintf(name,"%05d.tmp",i);
      unlink(name);
    }
    gettimeofday(&tv2,NULL);
    d = (tv2.tv_sec - tv1.tv_sec) + (tv2.tv_usec - tv1.tv_usec)/1.0E6;
    printf("Delete: %6.3fs total %7.5fs each\n",d,d/files);
  }

  return(0);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:11 EST