The problem is that we now have a race condition. The target process
could exec a setuid program after get_task() does the checks - we can
sleep in memcpy_tofs() due to a swap-in.
What is the advantage of using /proc/<pid>/mem instead of PTRACE_PEEKDATA
(other than it is cool because it uses /proc :-)? Perhaps strace could be
modified so that it doesn't need /proc at all, and then we can disable
/proc/<pid>/mem entirely, just to be sure that it has no more holes.
One possible fix for this race (suggestions for improvements welcome):
Allocate a temporary buffer (one page). Instead of doing memcpy_tofs()
directly, memcpy() data from the target process to this buffer, then
check if we are still allowed to access memory of this process. If not
- return with a short read, otherwise memcpy_tofs() from the buffer.
What happens to the mmap'd /proc/<pid>/mem areas when the target process
execs a new program?
Also, the problem with holding file descriptors potentially affects any
other /proc/<pid>/xxx files which have different permissions for the
owner and for everyone. Currently this includes /proc/<pid>/environ.
So I think we need a more general solution to this problem.
Another thing (not a bug but a performance issue) - linear search on the
process table may be expensive if there are many processes. Why not add
a new "struct task_struct *" member to "struct inode"? Do the search
only once, on first open() and not on every read(). I think programs
which read from /proc a lot (ps, top, ...) could benefit from this.
Argh, sometimes I miss kmem-ps. The /proc-based programs tend to break
with newer kernels, too, just like kmem-ps would. Someone might want to
allow only certain users to view the whole process table (so that most
users can only view their own processes) - with /proc, you have to set
the policy in the kernel (== even more bloat), and it would be so much
easier to do in user space...
Marek