Non-atomic copy_xxx_user problems

Bill Hawes (whawes@star.net)
Sun, 01 Jun 1997 08:33:09 -0400


I have a theory that some of the kernel problems being reported are due
to critical sections being broken by calls to copy_from_user or
copy_to_user while under heavy paging loads. This is based on a code
review; there seem to be _many_ places in the kernel where delicate
operations are only partially complete at the time a call is made to
copy_xxx_user.

To test this out, I made a simple modification to the copy_*_user macros
to set a flag in the task structure before starting the copy operation.
I then modified schedule() to clear the flag whenever it switches to a
new task. Thus if the flag is still set when the copy is complete, we
got lucky (this time anyway) and the copy was atomic. But if the flag
is cleared, then any number of other tasks may have been run before
control returned. In this case the function calls printk() with a
message, conveniently including the file and line number where it was
invoked.

With these patches in place, I then rebooted with mem=12M and started a
couple of kernel compilations to get some good paging. Sure enough, the
system console (and kernel log file) started reaping warnings of
non-atomic calls. A sample looks like this:
acer kernel: Non-atomic copy_from_user in pipe.c at line 127
acer kernel: Non-atomic copy_from_user in file.c at line 205
acer kernel: Non-atomic copy_to_user in filemap.c at line 673
acer kernel: Non-atomic copy_to_user in iovec.c at line 97
acer kernel: Non-atomic copy_to_user in pipe.c at line 72

I then went and checked some of these out. The messages from pipe.c
are harmless, as pipe was designed knowing that the calls would be
non-atomic, so everything is locked down.

File.c is another matter -- seems like this could be a real problem.
The code (it's fs/ext2/file.c) looks like this:
c -= copy_from_user (bh->b_data + offset, buf, c);
if (!c) {
brelse(bh);
if (!written)
written = -EFAULT;
break;
}
update_vm_cache(inode, pos, bh->b_data + offset, c);
pos += c;
written += c;
buf += c;
count -= c;
mark_buffer_uptodate(bh, 1);
mark_buffer_dirty(bh, 0);

One problem here is that the buffer bh is dirty as soon as the copy
operation starts, but isn't marked as dirty until later. If the user
buffer spans a page boundary, and the first page is in memory but the
second on isn't, then the copy operation will be only partially complete
when the page fault occurs. A reschedule will allow other tasks to
proceed with calls to the file system, and with the internal logic now
out-of-sync, it's not a happy situation.

I think it would be very helpful for other people to run this patch to
see what potential problems are occurring; hopefully we can correlate
the warning messages with subsequent Oops to see what code needs to be
rewritten first. One nice thing is that these messages are just
warnings that something may go wrong, and so should come early enough to
be logged before the system crashes.

The patch is against 2.1.42, and should be relatively be benign if your
system isn't doing any paging, other than adding 10K of kernel bloat.
If you get an Oops, check the kernel log file for any new non-atomic
messages. (Don't bother reporting pipe.c, n_tty.c, or iovec.c messages,
as they're harmless.)

-Bill Hawes

--- ../linux-2.1.40/include/linux/sched.h Wed May 28 17:57:15 1997
+++ include/linux/sched.h Sat May 31 15:07:20 1997
@@ -284,6 +284,9 @@
#define PF_DTRACE 0x00200000 /* delayed trace (used on m68k) */
#define PF_ONSIGSTK 0x00400000 /* works on signal stack (m68k only) */

+#define PF_WANT_ATOMIC 0x01000000 /* want to do something atomically
*/
+#define PF_GOT_ATOMIC 0x02000000 /* did it without rescheduling
*/
+
/*
* Limit the stack by to some sane default: root can always
* increase this limit if needed.. 8MB seems reasonable.
--- kernel/sched.c.old Mon May 19 14:38:12 1997
+++ kernel/sched.c Sat May 31 15:04:11 1997
@@ -466,6 +466,9 @@
if (prev != next) {
struct timer_list timer;

+ /* clear the "atomic operation" flag */
+ prev->flags &= ~PF_GOT_ATOMIC;
+
kstat.context_swtch++;
if (timeout) {
init_timer(&timer);
--- include//asm-i386/uaccess.h.old Fri May 30 08:33:22 1997
+++ include/asm-i386/uaccess.h Sat May 31 23:11:30 1997
@@ -396,15 +396,31 @@
return n;
}

-#define copy_to_user(to,from,n) \
- (__builtin_constant_p(n) ? \
+#define COPY_TO_MSG "Non-atomic copy_to_user in %s at line %d\n"
+#define copy_to_user(to,from,n) ({ \
+ unsigned long temp; \
+ unsigned long flags = (PF_WANT_ATOMIC | PF_GOT_ATOMIC); \
+ current->flags |= flags; \
+ temp = (__builtin_constant_p(n) ? \
__constant_copy_to_user((to),(from),(n)) : \
- __generic_copy_to_user((to),(from),(n)))
+ __generic_copy_to_user((to),(from),(n))); \
+ if ((current->flags & flags) != flags) \
+ printk(&COPY_TO_MSG[0], __FILE__, __LINE__); \
+ temp; \
+})

-#define copy_from_user(to,from,n) \
- (__builtin_constant_p(n) ? \
+#define COPY_FROM_MSG "Non-atomic copy_from_user in %s at line %d\n"
+#define copy_from_user(to,from,n) ({ \
+ unsigned long temp; \
+ unsigned long flags = (PF_WANT_ATOMIC | PF_GOT_ATOMIC); \
+ current->flags |= flags; \
+ temp = (__builtin_constant_p(n) ? \
__constant_copy_from_user((to),(from),(n)) : \
- __generic_copy_from_user((to),(from),(n)))
+ __generic_copy_from_user((to),(from),(n))); \
+ if ((current->flags & flags) != flags) \
+ printk(&COPY_FROM_MSG[0], __FILE__, __LINE__);\
+ temp; \
+})

#define copy_to_user_ret(to,from,n,retval) ({ \
if (copy_to_user(to,from,n)) \