Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max

From: Valdis . Kletnieks
Date: Thu Jul 30 2009 - 10:18:15 EST


On Wed, 29 Jul 2009 19:17:00 +0300, Alexander Shishkin said:
>Is there anything dramatically wrong with this one, or could someone please review this?


> +               for_each_process(p) {
> +                       files = get_files_struct(p);
> +                       if (!files)
> +                               continue;
> +
> +                       spin_lock(&files->file_lock);
> +                       fdt = files_fdtable(files);
> +
> +                       /* we have to actually *count* the fds */
> +                       for (count = i = 0; i < fdt->max_fds; i++)
> +                               count += !!fcheck_files(files, i);
> +
> +                       printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
> +                                       p->pid, count);

1) Splatting out 'count' without a hint of what it is isn't very user friendly.
Consider something like "=> %s[%d]: open=%d\n" instead, or add a second line
to the 'VFS: file-max' printk to provide a header.

2) What context does this run in, and what locks/scheduling considerations
are there? On a large system with many processes running, this could conceivably
wrap the logmsg buffer before syslog has a chance to get scheduled and read
the stuff out.

3) This can be used by a miscreant to spam the logs - consider a program
that does open() until it hits the limit, then goes into a close()/open()
loop to repeatedly bang up against the limit. Every 2 syscalls by the
abuser could get them another 5,000+ lines in the log - an incredible
amplification factor.

Now, if you fixed it to only print out the top 10 offending processes, it would
make it a lot more useful to the sysadmin, and a lot of those considerations go
away, but it also makes the already N**2 behavior even more expensive...

At that point, it would be good to report some CPU numbers by running a abusive
program that repeatedly hit the limit, and be able to say "Even under full
stress, it only used 15% of a CPU on a 2.4Ghz Core2" or similar...

Attachment: pgp00000.pgp
Description: PGP signature