arch/i386/lib/checksum.c bug?

Colin Plumb (colin@nyx.net)
Fri, 11 Dec 1998 05:49:59 -0700 (MST)


The i386 asm code in checksum.c that Alan says is non-trivial is the sort
of stuff I'm used to working in (insanely optimized asm inner loops),
so I'm converting it to work with egcs 1.1.1. ("All bugs are shallow";
I've worked on *much* scarier asm code than this.)

However, in doing so, I've noticed a potential bug.

I'm also including a tutorial on the 2.1 fault-handling code, for
anyone who's interested.

The magic function is
unsigned int csum_partial_copy_generic (const char *src, char *dst,
int len, int sum, int *src_err_ptr, int *dst_err_ptr)

This copies src to dest, for a length of len, and returns the updated
TCP/IP checksum in "sum". (Actually it's a 32-bit checksum, which needs
to be folded to make a 16-bit checksum.)

If any errors occur, they are returned in *src_err_ptr or
*dst_err_ptr, just like errno. (-EFAULT if there's a problem,
otherwise unmodified.)

=== Begin tutorial ===

To do this, it uses a couple of macros
#define SRC(y...) \
" 9999: "#y"; \n \
.section __ex_table, \"a\"; \n \
.long 9999b, 6001f \n \
.previous\n"

#define DST(y...) \
" 9999: "#y"; \n \
.section __ex_table, \"a\"; \n \
.long 9999b, 6002f \n \
.previous\n"

The "9999b" and "6002f" references are to numeric labels.
Nuleric labels in gas may be re-used. Each time, they are
really given a new sub-number, so the references are 9999.1,
9999.2, 9999.3, etc. When you refer to one, you can refer
to "9999b", the *previous* (back) 9999, or 9999f, the
*next* (forward) 9999. gas just looks up 9999 in its internal
list of sub-numbers and translates it into a reference to
the appropriate 9999.n or 9999.(n+1).

People generally use just a few, small numbers, like 1, 2, 3. This
macro uses some really large ones to avoid colliding.

How 2.1 checks for erroneous access to user space is that it doesn't.
It has to check to make sure that the pointer is to the range of
addresses reserved as user space, but after that, the kernel has the
same access permissions as user space does, so if it tries to access a
bad page, it'll trap.

The trap handler then checks for the obvious things (page swapped out,
page unwriteable, copy-on-write), and if none of those apply, it used
to just oops.

Now, before oopsing, it searches through a special segment of memory
named "__ex_table". This consists of pairs of "fault_address,
where_to_jump". If the fault occurs at a listed address, the exception
handler just returns, to the *other* listed address.

There is "fix-up" code which deals with the error as required, usually
by stuffing -EFAULT into an appropriate variable and skipping the
memory access. The fix-up code is also stored in a seprate segment
megabytes away from the faulting code, so that in the common case where
there's no error, it never even gets loaded into cache.

The upshot is that there's nothing in the common-case code that
even has to check for a problem, so it runs really really fast.
The exception case is made more complex, but EFAULT is a
a *rare* exception in code, so it doesn't matter that it's
slowed down. (And I seem to recall that the table of exception
address is stored in sorted order, so the search is fast.)

=== End tutorial ===

Okay, enough of that. Now on to the part I think is
buggy, the exception handlers at 6001 and 6002:

# Exception handler:
################################################
#
.section .fixup, \"ax\" #
#
6000: #
#
movl %7, (%%ebx) #
#
# zero the complete destination - computing the rest
# is too much work
movl %6, %%edi
movl %9, %%ecx
xorl %%eax,%%eax
rep ; stosb
#
jmp 5000b #
#
6001: #
movl %1, %%ebx #
jmp 6000b #
#
6002: #
movl %2, %%ebx #
jmp 6000b #
#
.previous #
#
################################################

The ".section" says to put this code into a separate section.
When it comes time to link all of the .o files together, the linker
puts the code from each .o file into piles by section, then
puts all of the section piles together at the end. Thus,
this code won't be anywhere near the previous code.

%1 is src_err_ptr, and %2 is dst_err_ptr. (This is all GCC inline asm,
so there are %-expansions, and you need to double any % signs that you
want to see in the output.) This code loads the appropriate error
pointer into %ebx and jumps to comon code at 6000.

The code at 6000 stores -EFAULT (that's %7) in the error pointer
and, using some temporaries rmembered in the main copying operation,
zeros the destination before jumping back to the end of the copy.

This works fine for source errors. If the kernel code forgets to check
for errors and uses the buffer anyway, it'll be all zero as opposed to
uninitialized (and thus possibly containing sensitive data that
shouldn't go out over the network). Since this is not speed-critical,
it's a nice belt-and-suspenders.

But the bug: if the destination copy faults, won't the zeroing loop
fault, too? And since there's no exception table pointer, it'll oops.
Which is usually regarded as a bug?

Since we've already initialized the destination buffer with suitable
data up to the point of the fault, it's not uninitialized. So do we
need to clear it at all?

Wouldn't better code be:

# Exception handler:
################################################
#
.section .fixup, \"ax\" #
#
# Source fault #
6001: #
movl %1, %%ebx #
movl %7, (%%ebx) #
# zero the complete destination - #
# computing the rest is too much work #
movl %6, %%edi #
movl %9, %%ecx #
xorl %%eax,%%eax #
rep ; stosb #
#
jmp 5000b #
#
# Destination fault #
6002: #
movl %2, %%ebx #
movl %7, (%%ebx) #
jmp 5000b #
#
.previous #
#
################################################

Is my thinking right here, or am I confused about something? Should I
worry about the case of an unmapped page in a buffer followed by mapped
pages which should get cleared? (It can be done just by re-trying the
clear at page boundaries until there's no more, but it's a bit ugly,
and network packets aren't usually that big.)

-- 
	-Colin

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/